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RATE OF COMPREHENSION OF READING: 
ITS MEASUREMENT AND ITS RELATION 
TO COMPREHENSION 


PAUL BLOMMERS AND E. F. LINDQUIST 
State University of Iowa 


INTRODUCTION 


The relationship between reading rate and comprehension has 
been extensively studied with widely varying results.* These 
variations are probably due primarily to differences in the 
methods of measurement employed, or in the manner in which 
reading rate and comprehension have been defined. Insufficient 
attention has been paid, in general, to defining terms precisely 
or to assuring the validity of the measures used. The common 
practice has been simply to employ existing measuring instru- 
ments without regard to their appropriateness in a study of 
relationship. As a consequence, investigators have differed 
considerably both in results obtained and in their interpretation. 
The purpose of this article is to describe an investigation of the 
relationship between reading rate and comprehension which 
was based upon tests especially developed to yield measures 
valid for use in such a study. 


CRITERIA OF APPROPRIATENESS OF TESTS FOR A STUDY 
OF RELATIONSHIP 


The rate at which a given individual reads depends upon a 
wide variety of factors. It depends, of course, upon the nature 
of the material read—upon its content, difficulty, typographical 
features, etc. It depends, also, upon certain characteristics 
of the reader—upon his reading skills and habits, his experiential 
background, his physical, mental, and emotional status, his 





* Correlation coefficients ranging from —.47 to .92 have been reported. 
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acuity of vision, etc. It depends, finally, upon the purpose for 
which the reading is done. For example, the reader may wish 
only to determine in general terms what the material is about. 
He may seek only to answer certain specific questions, or to 
locate certain specific facts. On the other hand, he may stop to 
ponder over the importance or the implication of the ideas 
presented. He may pause to memorize some of them. He may 
consider ways in which he can make possible use of these ideas. 
He may reorganize them to fit them better to his needs. He 
may think of new questions as he reads and re-read or continue 
reading with these questions in mind. What he does while he 
reads depends, then, on the purpose for which he undertook the 
reading in the first place or upon purposes which develop during 
the course of the reading. The nature of these purposes is 
conditioned by his educational background. 

Whether or not one considers all of these acts as a part of 
the reading process in the technical sense, or whether or not one 
considers all of the skills and the abilities exercised as reading 
skills, it must be acknowledged that these acts take place during 
reading and that a part of the time each individual spends in 
reading is given over to just such acts. It is apparent, then, 
that there is no meaningful ‘single’ reading rate (in words per 
minute) for any given individual, but that, instead, he reads 
at many different rates, each specific to a different purpose. 
It follows also, that for any given group of individuals, the 
degree of relationship between reading rate and any other variable 
may depend upon the purpose or purposes for which they read, 
that this relation will be most meaningful only if they all read 
for the same purpose, and that, accordingly, there may be many 
different degrees of relationship, each specific to a different 
purpose. 

It is, therefore, essential in a relationship study that the 
purpose of the reading be very carefully defined and controlled. 
This means, of course, that the purpose must be set in advance 
for all readers and must be clearly understood by them. Obvi- 
ously, many of the purposes suggested above, especially those 
initiated during the course of the reading, can not be so defined 
and controlled. Hence, any rate-comprehension relationship 
study which is to yield meaningful results is limited not only 
by the fact that it must be based on reading done by a specific 
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group for a specific purpose, but also by the fact that the purpose 
for which the reading is done must be of such a nature that it 
can be defined and controlled. 

In spite of these limitations, adequacy of definition and close- 
ness of control of reading purpose are adopted as the first criteria 
of the tests used in a relationship study. Reading rate can then- 
be based upon the time required to accomplish the set purpose. 
It is particularly important that the control be adequate to 
prevent reading for self-initiated purposes which might lead 
some readers to set higher standards of comprehension for them- 
selves than others do, or to engage in reading activities far beyond 
those demanded by the set purpose. Each reader should spend 
only as much time on any passage as is needed to accomplish the 
set purpose, and no more. Few, if any, of the previously avail- 
able rate of reading tests have satisfied these first criteria. 

A third criterion, possibly implicit in the first two, is that 
the rate and comprehension scores both be based on the same 
materials, or on materials equivalent in content and read for 
equivalent purposes. It is obviously not meaningful to relate 
a measure of rate based on simple material read for one purpose 
to a measure of comprehension based on highly dissimiliar and 
more difficult material read for another purpose; yet, this is 
just what has been done in many of the earlier rate-comprehension 
relationship studies. The same control of purpose, then, must 
extend to both the rate and the comprehension tests used in a 
relationship study, and the relationship established must be 
considered as unique to that purpose. 

Just as it is important to control the reading so as to prevent 
the reader from reading for additional and irrelevant purposes of 
his own or to prevent him from trying to comprehend more than 
the set purpose demands, so is it also important to guarantee 
that the measure of rate is based upon as much comprehension 
as that purpose demands. The reader should spend only the 
minimum time needed to accomplish the set purpose, but unless 
he does accomplish that purpose the time spent should not be 
considered in determining his rate. If rate of comprehension is 
regarded as rate of comprehension at a certain level, then com- 
prehension both above and below that level must be ruled out 
as much as possible in the determination of rate. This is a new 
principle in the measurement of reading rate, but its validity 
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may be readily demonstrated in terms of a concrete example. 
Suppose a test consists of ten independent exercises of increasing 
difficulty, each of which is to be read only for the purpose of 
finding the answer to a question based upon the passage (the 
question being read in advance of the passage). Suppose that A 
is able to answer all ten questions, but that B and C are each 
able to answer only the first five. Suppose, however, that for 
some time B stubbornly continues reading and rereading the 
last five passages in a vain effort to comprehend them sufficiently 
to answer the questions, but that C recognizes at once that they 
are beyond his comprehension and quits trying after a single 
reading. Clearly, the total time spent on the ten passages is 
not a valid basis for comparing the reading ‘rates’ (in words per 
minute) of any two of these three readers. Comparable rate 
measures could be derived from the first five passages, but for 
the last five the times spent by B and C are indicative, not of 
the rate at which they comprehend at the level set by the ques- 
tions, but of the rate at which they fail to comprehend at this 
level, as well as of certain personality differences between them. 
A fourth criterion, then, is that the reading rate score should 
be based only upon time spent in comprehending at a certain level 
of comprehension. Time spent in failing to comprehend at this 
level should not be combined with time spent in successfully 
comprehending at this level. 

The preceding illustration will also serve to illustrate a fifth 
criterion. It has been noted that the ten passages vary in 
difficulty. If so, it is quite apparent that the rate at which an 
individual reads an easy passage may differ considerably from 
that at which he reads a difficult passage of the same length. 
It is quite conceivable, however, that his relative reading rate 
in a given group of readers may be approximately the same for 
both passages—that is, his rank in the group, in order of time 
spent, may be approximately the same for both passages, since 
the more difficult passage may be as much more time-consuming 
for the others as it is for him. If this is the case, one could 
secure comparable rate measures for two individuals, even though 
they did not read the same passages, if the rate measures were 
composites or averages based on relative rates for individual 
passages, rather than upon absolute rates. This treatment 
would make it possible to employ passages of moderately differ- 
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ing* content and difficulty, and yet to base the rate measure on all 
passages read by each individual, even though different indi- 
viduals read different passages (which is what would happen if 
the rate measures were to be based only upon passages success- 
fully read). Using passages of varying content—for example, 
passages dealing with a representative sample of differing topics 
from the whole field of the social studies—would have the 
advantage of making the rate measure more highly generalized 
and more widely useful and meaningful. A fifth criterion, then, 
is that the rate score should be based on relative rates established 
separately for individual passages of varying content and 
difficulty. 

Certain rate-comprehension relationship studies have used 
comprehension tests which were administered in a time limit so 
short that even the fastest readers could not complete the test. 
As a result, the comprehension scores depended upon rate of 
comprehension as well as upon power to comprehend. Such 
comprehension scores correlated with rate scores would lead to 
an overestimate of the degree of relationship between rate and 
power of comprehension. Hence a sixth criterion is that the 
comprehension scores used in a rate-comprehension relationship 
study should be work-limit scores or measures of power to 
comprehend. 

Several recent relationship studies have used comprehension 
tests to determine what was called a ‘rate of comprehension’ 
score. These comprehension tests usually consisted of a number 
of paragraphs or short selections with one or more questions over 
each. The working time was so limited that the fastest reader 
could not possibly complete the test. The ‘rate of compre- 
hension’ score was taken to be simply the number of questions 
correctly answered within the time set. This procedure, of course, 
fails to distinguish between time spent in successful reading and 
time spent in unsuccessful reading, and consequently is in 
violation of the fourth criterion. What is of equal importance, 
however, is the fact that such rate scores measure power of 
comprehension as well as rate of comprehension. Hence, such 
rate scores if correlated with power of comprehension scores would 
lead to an overestimate of the rate-power relationship. There- 





* As stated in the third criterion, the materials used should not system- 
atically differ from those used in measuring comprehension. 





~ Ae ee 


454 The Journal of Educational Psychology 


fore, a seventh criterion is that the rate score be independent of 
power of comprehension. 


SUMMARY OF CRITERIA FOR TESTS OF RATE OF COMPREHENSION 
AND POWER OF COMPREHENSION TO BE USED 
IN A RELATIONSHIP STUDY 


1) The purpose for which the reading is done should be clearly 
defined. 

2) The purpose for which the reading is done should be closely 
controlled. 

3) Both rate and comprehension scores should be based on the 
same materials or on equivalent materials read for equivalent 
purposes. 

4) The rate score should be based only upon time spent in 
accomplishing the set purpose. Time spent in failing to accom- 
plish the set purpose or in accomplishing other purposes should 
be ignored in deriving the rate score. 

5) The rate score should be a composite derived from a number 
of relative rates established separately for individual passages 
of varying content and difficulty. 

6) The comprehension score should be a work-limit score or a 
measure of power to comprehend. 

7) The rate score should be independent of power to com- 
prehend. 


A GENERAL DESCRIPTION OF THE TEST 


The test constructed for the purpose of this study consisted of 
a series of independent reading exercises. Each exercise con- 
sisted, in order, of (a) a question, (b) a reading selection con- 
taining the answer to the question, and (c) four or more suggested 
answers to the question, one of which is definitely better than any 
of the others. A sample exercise follows. 


SAMPLE EXERCISE 


According to the following paragraph, why was the life of the 
indentured servant of colonial times apt to involve greater hardships 
than the life of a slave? 

Indentured servants composed the largest dependent class during 
the Seventeenth Century in America. These were workers who 
served for a limited period of time under a labor contract in return for 
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their transportation to the Colonies. What treatment the servants 
received depended upon the disposition of their masters, although it is 
safe to assume that most of them endured a pretty hard lot. The 
work performed was not only inherently strenuous and virtually 
without end, but the master who could command the labor of the 
servant only a few years had no material interest in him save that of 
obtaining the maximum service within a short time. In this respect, 
servitude wore a harsher aspect than chattel slavery: the owner of a 
slave would at least conserve his personal property. The legal right 
of the master to whip the servant often placed excessive power in 
irresponsible hands and in extreme cases masters were accused of 
whipping the servants to death. One observer said of Maryland 
servants that ‘‘they groan beneath a worse than Egyptian bondage.” 

The life of an indentured servant was apt to involve greater hard- 
ship than the life of a slave because: 


(a) the work they performed was inherently strenuous and vir- 
tually without end 

(b) by and large the people who were masters of indentured serv- 
ants had bad dispositions 

(c) of the temporary nature of the period of servitude 

(d) masters had the legal right to whip a servant and in extreme 
cases whipped them to death. 


The selections were chosen and questions formulated to test a 
relatively high order of comprehension. The subjects were not 
asked to read to find facts specifically stated in the passage but 
rather were called upon for interpretations, inferences, evalua- 
tions, etc. In all, one hundred twenty exercises were con- 
structed, covering a wide variety of topics. After preliminary 
trial the forty best exercises were chosen and assembled into two 
equivalent forms of twenty exercises each. 


THE ADMINISTRATION OF THE TEST 


Both forms of the test were administered to six hundred sev- 
enty-two juniors and seniors enrolled in four Iowa high schools. 
The forms were arranged in alternate order for distribution to 
the students, so that half took Form A first and half took Form B 
first. Both forms were administered in a single sitting. 

To measure the time spent by each subject on each exercise, 
consecutive numbers were prominently displayed to the group at 
intervals of ten seconds during the test period. The testees were 
directed to proceed as follows with each exercise: 
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1) Record the time (i.e., the number then being displayed) of 
beginning work on the exercise 

2) Read the question 

3) Read the article for the express purpose of finding the answer 


to the question 
4) Indicate the choice of the best answer on the answer sheet 
5) Record the time finished (i.e., the number then being displayed). 


The subjects were also directed to read at a rate which seemed 
to them personally the most efficient for the accomplishment of 
the purpose set. Speed was made secondary to accomplishment 
of purpose. The pupils were fully aware that they were timing 
themselves on each exercise. This awareness was deemed suffi- 
cient to prevent them from spending time in excess of that needed 
to accomplish the purpose set. 

The pupils were further directed to complete each exercise in 
order, and never to turn back to an exercise once it had been com- 
pleted and the time finished recorded. To facilitate compliance 
with this request each exercise was printed in full on a single 
sheet of the test booklet. 


THE USE OF THE McCALL T-TRANSFORMATION 


The form of the distributions of scores on any educational tests 
deperds in part upon the difficulty and discriminatory power of 
the test items. Inasmuch as the type of relationship between 
sets of scores may be affected by the forms of their distributions, 
it is possible to vary the relationship by manipulating the content 
of the test. To control this factor the McCall'* (see pp. 505-8) 
transformation was applied to the distributions of rate and com- 
prehension scores obtained in this study. 


THE RATE MEASURE 


The raw measure of the time required by an individual to work 
a given exercise was taken to be the difference between the time 
of beginning and finishing work as recorded by that individual. 
The first exercise of each form was used as a practice exercise, and 
the raw time score on it was disregarded. 

In accordance with the fourth and fifth of the criteria previ- 
ously stated, the rate measure used in this study was a composite 
of the rate measures separately obtained for only those exercises 
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to which the subject responded correctly. It may be considered 
as a measure of the rate at which the subject comprehends at the 
levels demanded by the questions. 

The composite rate measure which was derived from the rate 
measures for the separate selections is rather complex in charac- 
ter, and may be best explained by considering first a somewhat 
simpler ‘unadjusted’ composite which was also obtained for each 
subject. The steps in deriving this ‘unadjusted’ composite were 
as follows: 

1) For each exercise, a distribution was made of the raw time 
scores of all subjects who responded correctly to that exercise. 

2) For each of these distributions, the raw time scores were 
transformed into McCall T-scores, i.e., into measures of relative 
rate. 

3) For each subject, the mean of his T-scores was computed 
(for only those exercises to which he responded correctly). 

4) A distribution of these mean T-scores of all subjects was 
prepared, and these means were themselves transformed into 
McCall T-scores. This T-score for any subject is known as his 
‘unadjusted’ composite rate score. 

It should be noted first that the meaningfulness of this com- 
posite depends upon the inter-relationships between the separate 
rate measures for the individual selections. If the subject’s 
relative rate (T-score) tends to be the same for an easy selection 
as for a difficult one, or the same for one type of content as for 
another, it is meaningful to speak of his ‘general’ reading rate, 
and to compute an estimate of it by averaging his T-scores for 
separate selections. If, on the other hand, the subject’s relative 
rate actually differs widely from selection to selection, then he 
has no ‘general’ relative rate at all, but only specific relative rates 
for different types of materials. The procedure here employed 
assumes that the latter is not the case. Some evidence support- 
ing this assumption will be presented later. 

There remains another factor, however, which theoretically 
invalidates the use of this ‘unadjusted’ rate score in a rate-com- 
prehension and relationship study. If a relationship does exist 
between rate and comprehension, then the better readers who 
succeed on the more difficult exercises will also be the faster 
readers, and under the procedure described an individual who 
actually ranks fairly high in rate may make very low T-scores on 
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the difficult items, since the distributions of T-scores for those 
items will be based on a selected group of fast readers. 

To avoid this possible defect of the unadjusted score, the T- 
scores for the more difficult exercises were adjusted so as to make 
them more comparable to those for the easier exercises, and the 
rate score computed from these adjusted T-scores. The steps by 
which this adjustment was made may be illustrated in the case of 
exercise 18 in Form B (18B), which was a relatively difficult 
exercise. 

1) Exercise 2 in Form A, on which nearly all subjects suc- 
ceeded, was selected as a ‘base’ exercise. 

2) For only those subjects who responded correctly to both 
exercises 2A and 18B, percentile graphs of (a) their T-scores on 
2A (determined as previously described), and (b) their raw time 
scores on 18B were drawn on the same chart. From this chart it 
was possible to read, for each subject, the T-score on 2A which 
was ‘equivalent’ to his raw time score on 18B, i.e., which had the 
same percentile rank in this selected group. This equivalent T- 
score for each subject was taken as his ‘adjusted’ T-score on 18B.* 

For each subject a composite T-score was obtained from the 
‘adjusted’ T-scores for all exercises upon which he succeeded in 
the same manner in which the ‘unadjusted’ composite T-score 
was obtained. This T-score for any subject is known as his 
‘adjusted’ composite rate measure. 


OTHER RATE MEASURES YIELDED BY THE TEST 


Certain a priori reasons have been presented for basing an index 
of reading rate only upon the exercises which are successfully 
read. It may also be shown experimentally that this procedure 
is defensible. ‘To accomplish this, an index of rate for the exer- 
cises failed was needed. Accordingly, the raw time scores on each 
exercise, regardless of success or failure to comprehend, were con- 
verted into McCall T-scores. From these rate T-scores on indi- 
vidual exercises the following composites were obtained for each 





* Actually this procedure was repeated three additional times using with 
exercise 18B base exercises 3A, 2B, and 3B. The four adjusted T-scores 
thus obtained were averaged and this average was used as the subject’s 
‘adjusted’ rate score on 18B. The purpose of this procedure was to smooth 
out irregularities due to errors in measurement. 
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_of three hundred individuals selected at random from the larger 
group. 

1) The S-rate score. This score was derived by computing for 
each individual the mean of the T-scores on only those exercises 
successfully solved and by converting these means into McCall 
T-scores. 

2) The F-rate score. This score was derived by computing 
for each individual the mean of the T-scores on only the exercises 
failed and by converting these means into McCall T-scores. 

3) The A-rate score. This score was derived by computing 
for each individual the mean of the T-scores on all exercises, 
regardless of success or failure, and by converting these means 
into McCall T-scores. 

Three other ‘rate’ scores were derived from the test. These 
are defined below: 

1) The Time-Limit Comprehension (TLC) score. The TLC- 
score was based on the number of exercises correctly solved in a 
relatively short period of working time. The period set was 
twenty minutes for each form.* The McCall transformation was 
applied. This score confounds comprehension and rate. Other 
investigators of the rate-comprehension relationship have used 
this type of rate measure, calling it a ‘rate of comprehension’ 
score. ‘:5.6.14 

2) The Working-Time (WT) score. The WT-score was based 
on the total time required to complete the test. The McCall 
transformation was applied. This score ignores the fact that 
some of the reading done does not come up to the minimum level 
of comprehension, so that time spent in failing to comprehend is 
confounded with time spent in successful reading. Among the 
investigators of the rate-comprehension relationship who have 
used this type of rate score are Abel,' King,'® and Henderson.* 

3) The Time-Limit Amount (TLA) score. The TLA-score 
was based on the number of words read in twenty minutes. 
Again the McCall transformation was used. Like the WT-score, 
this score confounds time spent in successful and unsuccessful 
reading. This is a very common method of measuring reading 
rate. It has usually been based on amounts read in two to five 
minutes. 





* Each form was completed by only three subjects in this period. 
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THE COMPREHENSION MEASURE 


The comprehension measure was based on the total number of 
exercises correctly solved. The McCall transformation was 
applied. 

In accordance with the fifth criterion cited, a sufficiently lib- 
eral amount of working time was allowed to enable practically 
all subjects to complete the test.* Hence, this measure will be 
referred to as the ‘power of comprehension’ score. 


THE USE OF THE WITHIN-GRADES WITHIN-SCHOOLS CORRELATION 
COEFFICIENT 


The relationship of most interest to the classroom teacher 
is that which obtains for a particular class. It was decided, 
therefore, to use throughout this study the within-grades within- 
schools product-moment correlation coefficient. This coefficient 
is equivalent to the weighted mean of the correlations computed 
separately for each grade in each school, and consequently the 
possible effects of school and grade differences are eliminated 
from it.T 


RELIABILITY OF THE TEST 


Indices of reliability for the various measures described were 
determined by finding the correlation between the scores on 
Forms A and B of three hundred subjects selected at random from 
the larger group. Since in most phases of this study the subject’s 
composite score on Forms A and Bf was used, reliability coeffi- 
cients for such composite scores were estimated by means of the 
Spearman-Brown formula. The reliability coefficients for all the 
scores yielded by the experimental test are presented in Table I. 
The one per cent fiducial limits of each coefficient are given in the 
parentheses. 





* Of the 672 tested, only 3.1 per cent failed to complete Form A and 3.4 per 
cent failed to complete Form B. Since the test was administered to public 
school groups, it was not practicable to allow sufficient time for every 
individual to complete the test. 

t For a complete discussion of the advisability of employing a within- 
groups coefficient of correlations, see E. F. Lindquist, Statistical Analysis in 
Educational Research," p. 219ff. 

t These composite scores were obtained by regarding the two forms as a 
single test and following the procedures previously described. 
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TABLE I1.—RELIABILITY INDICES FOR THE VARIOUS MEASURES 
YIELDED BY THE EXPERIMENTAL TEST 
(One Per Cent Fiducial Limits in Pafentheses) 
Reliability of Reliability of 


Measure Single Form Whole Test 
Power of Comprehension Score. .71 (.63-.78) .83 (.77-.87) 
Adjusted Rate Score........... .68 (.59-.75) .81 (.75-.86) 
Unadjusted Rate Score......... .71 (.62-.78) .83 (.77-.87) 
ES a oi ahs es wale hued es .67 (.58-.75) .80 (.74-.85) 
nis il wa oe-ae Saw Sea .75 (.67-—.78) .86 (.81-.90) 
iil ou td altel Se .75 (.68-.77)  .86 (.82-.90) 
I eg oe oe .78 (.71-.83)  .87 (.83-.90) 
cc eieuwuks wen ews .73 (.65-.79) .84 (.79-.88) 
re ee .72 (.64-.79)  .84 (.79-.88) 


VALIDITY OF THE POWER OF COMPREHENSION SCORE 


The experimental test under discussion was administered in 
March, 1943. During the preceding September the subjects had 
taken the battery of tests known as the Jowa Tests of Educational 
Development.'? Included in this battery are three tests measuring 
ability to interpret social studies materials, natural science mate- 
rials, and literary materials. Together these tests comprise a 
total of two hundred forty items covering thirty selections, and 
require 170 minutes of actual working time. Correlation coeffi- 
cients were obtained for the scores on each of these tests and the 
power of comprehension score of the experimental test. The 
coefficients which are presented in Table II show that by these cri- 
teria the power of comprehension score on the experimental test is 
fairly valid as a measure of power to read with understanding. 


TABLE II.—CoRRELATION OF THE EXPERIMENTAL POWER OF 
COMPREHENSION SCORES WITH THE SCORES ON THE READING 
TESTS OF THE Iowa TESTS OF EDUCATIONAL DEVELOP- 


MENT 
One Per Cent 
Fiducial 
Test r Limits 
Reading-Social Studies................. .78 .71-—.84 
Reading-Natural Sciences.............. .73 . 65-80 


Reading-Literature.................... .76 .69-—.82 
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THE RELATIONSHIP OF THE ADJUSTED RATE SCORE TO OTHER 
MEASURES OF READING RATE 


The experimental adjusted rate of comprehension measure 
developed for this study was used as a criterion in evaluating 
other tests of reading rate. The following tests were given to 
groups selected from the subjects who participated in the study.* 


1) Reading Rate Test, Board of Examinations, University of 
Chicago, 1943. Consists of a long selection on Lafayette. Score 
based on amount read in ten minutes. 

2) Iowa Silent Reading Test, Adv. Yields two rate scores: (a) 
called ‘rate’ is based on amount read in one minute on each of two 
short selections; and (b) called ‘rate of comprehension’ is a composite 
of (a) and a measure of the ability to answer correctly and in a rela- 
tively short time a number of questions over the selections used in 
obtaining (a), the testee being allowed two additional minutes to 
complete the reading of each selection. 

3) Traxler High School Reading Test. Reading rate score, based 
on amount read in five minutes. 

4) Minnesota Speed of Reading Test. Consists of a series of short 
paragraphs, each containing an irrelevant phrase. The score is the 
number of such phrases the testee correctly marks out in a limited 
time (six minutes). 


The correlations between the experimental adjusted rate of 
comprehension score and scores on the four tests cited are given 
in Table III. These coefficients are not extremely low. How- 
ever, neither are they sufficiently high to offer strong defense of 
the tests concerned as measures of rate of comprehension. It is 
of significance that the scores of the Minnesota Speed of Reading 
Test tend to correlate at least as highly with the experimental 
comprehension scores as they do with the adjusted rate scores, 
the correlation in the former case being .62 (.47-.74). This fact 
suggests that the scores on the Minnesota test are a function of 
ability to comprehend as well as of ability to read rapidly. It 
should also be noted that of the three rate tests yielding scores 
based on the amount read in a limited time the one which corre- 
lates most poorly with the criterion is the one which employs the 
shortest time limits.7 





* One of the four tests was given in each of the four participating schools. 
t This bears out data reported by Traxler.” 
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TaBLE II].—CoRRELATION OF THE EXPERIMENTAL ADJUSTED 
RATE OF COMPREHENSION SCORE WITH THE SCORES ON 


OTHER TrEsts OF READING RATE 
One Per Cent 


Fiducial 
Test r Limits 

Reading Rate Test: .65 .54-.74 

Board of Examinations 

University of Chicago, 1943 
Iowa Silent Reading Test, Adv.: 

a a ak a iets BUS wa dace 6 60 ae .49 .d1l-—.64 

Rate of Comprehension .59 .43-.71 
Traxler High School Reading Test: 

ES en re .64 .50-.76 
Minnesota Speed of Reading Test: .50 .33-.64 


The correlations between the adjusted rate scores and the other 
measures of rate derived from the experimental test are given in 
Table IV. Coefficients are presented for these scores when based 
on identical materials (i.e., on the two forms combined) and also 


TABLE IV.—CoRRELATIONS BETWEEN ADJUSTED RATE Score 
AND OTHER RATE ScorES YIELDED BY EXPERIMENTAL TEST 


(One Per Cent Fiducial Limits in Parentheses) 
r’s for Com- __ r’s for Adjusted Rate 


Rate bined-Forms on Form A vs. Other 
Measure Scores Scores on Form B 
Unadjusted Rate Score .99 (.985—.995) .67 (.58-.75) 
S-Score.............. .93 (.91 —.95) .64 (.54-.73) 
RS cede deee s&s .77 (.70 —.83) .51 (.39-—.62) 
cna -aild ain seek .96 (.95 —.97) .65 (.55-.73) 
TLC-Score........... .63 (.53 -.71) .50 (.37-.61) 
WT-Score............ .95 (.93 -.97) .67 (.58-—.75) 
MD. cic cin viwwds .84 (.92 —.96) .67 (.58—.75) 


on equivalent materials, (i.e., the adjusted rate score on Form A 
and the other rate scores on Form B). The reductions in the 
coefficients of the latter series are consistent with the differences 
in the reliabilities of the combined-forms scores and the single- 
form scores. Hence, the indices reported for the combined-form 
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scores are not inflated by irrelevant factors resulting from the 
identity of the materials. 

The extremely high relationship between the adjusted and 
unadjusted rate scores is consistent with the low correlation 
between reading rate and power of comprehension.* Evidently 
the unadjusted measures yielded by this particular experimental 
test are practically as valid as the adjusted measures. The low 
relationship between the adjusted rate measure and the TLC- 
score, which is a measure of comprehension in a limited time, is 
likewise consistent with the low rate-comprehension relationship. 

It is important to note that the correlation between the 
adjusted rate scores and F-scores is significantly f lower than that 
between the adjusted rate scores and the S-scores. The dif- 
ference between these coefficients supports the earlier suggestion 
that rate of successful reading and rate of unsuccessful reading 
(i.e., of failing to achieve the purposes set) are not the same thing, 
and that a measure of rate of reading should be based solely upon 
material read with at least some demonstrated degree of under- 
standing rather than upon a mixture of comprehended and 
uncomprehended material. 


THE RATE-COMPREHENSION RELATIONSHIP 


The correlation between the adjusted rate score based on both 
forms and the power of comprehension score based on both forms 
was .30 (.21-.39).f This coefficient, while indicating a statisti- 
cally significant relationship, is too low to justify the view com- 
monly held by educational psychologists that a ‘‘ moderately high 
correlation exists between reading rate and comprehension. ’’!® 

Table V contains the correlation coefficients between the power 
of comprehension scores and the various rate scores derived from 
the experimental test. It should be noted that the relationship 
between the A-scores and the power of comprehension scores is 





* This relationship is discussed in the next section. 

t The test of significance applied is the one cited on page 503 of 1. Here 
t = 11.673 (to. = 2.576). 

tA test for linearity of regression was applied ('!, p. 235f) and the 
hypothesis of linearity was found to be tenable. When rate was taken as 
the independent variable, the test resulted in F = 1.25 (Fos = 1.60). When 
comprehension was taken as the independent variable, the test resulted in 
F = 1.09 (Fi. = 1.60). 
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significantly * lower than that between the adjusted rate scores 
and the power of comprehension scores, despite the fact that the 
A-scores are highly related to the adjusted rate scores (.96). The 
complete lack of relationship between the F-scores and the power 
of comprehension scores suggests an explanation for this phe- 
nomenon. Since, as would be expected, the F-scores showed a 
zero correlation with power of comprehension scores, the practice 
of basing a measure of reading rate upon a fusion of time spent in 
successful and unsuccessful reading leads to a measure of rate 
which is not as highly related to power of comprehension as is one 
which is based solely upon successful reading. 


TABLE V.—CORRELATION BETWEEN COMPREHENSION SCORES AND 
Various RatTE Scores YIELDED BY EXPERIMENTAL 


TEST 
One Per Cent 
Fiducial 
Rate Score r Limits 
jo cy cane db we we oS 31 .17-.44 
SET EE STEELE TO ee .30 .16-.43 
ES ce as Sa el UO de eee .30 .16—.43 
2 aS) ome i heen os Oe, Tie De — .01 — .16-.14 
EES SE ae ae Oe eee 25 .09-.38 
aM bt a il cee a 85 .80-.89 
SORA aa re eee ae ae .22 .07—.36 
nae eas SR aa eae eae .30 .16-—.43 


* This r, like the others in this table, is based on 300 cases. The value, 
.30, previously cited was based on 672 cases. 


A similar result occured in the case of the WT-score, which also 
fuses time spent in successful and unsuccessful reading.t| The 
one score available which fuses time spent in both successful and 
unsuccessful reading and which behaves in an apparently con- 
trary manner is the TLA-score. In this connection it should be 
noted, however, that since the exercises were arranged in order of 
ascending difficulty, the degree to which the fusion takes place in 
this score is bound to be considerably less than in the case of the 
WT-scores or A-scores. 





*t = 7.63 (to: = 2.576). 
tt = 7.47 (to. = 2.576). 
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Finally, it will be recalled that the TLC-score is in part a meas- 
ure of comprehension and may not be regarded as strictly a rate 
measure. The high relationship between this score and power of 
comprehension is consistent with the nature of this measure. 


THE USE OF IDENTICAL MATERIALS IN DETERMINING THE RATE- 
COMPREHENSION RELATIONSHIP 


The various rate scores and the power of comprehension scores 
used to determine the coefficients reported in Table V were based 
on identical materials. Correlations between such scores may be 
due in part to common factors irrelevant to the abilities which the 
tests are intended to measure. ‘To check on the extent to which 
such factors might be present, the correlations between the vari- 
ous rate scores on Form A and the power of comprehension scores 
on Form B were determined. There are two factors which might 
contribute to the lowering of these cross-form correlations from 
the values reported in Table V; (a) the fact that irrelevant com- 
mon factors are no longer present, and (b) the fact that scores 
based on the separate forms are less reliable than scores based on 
the whole test (combined forms). 

A formula is given by Kelly® (p. 200) for predicting, from the 
relationship between a set of test scores and a criterion, what this 


TABLE VI.—CoMPARISON OF CORRELATIONS BETWEEN Com- 
PREHENSION SCORES AND VARIOUS RATE ScORES ON DIF- 
FERENT MATERIALS AND ON IDENTICAL MATERIALS 











r on Different Materials on 
Rate Measure ‘ Identical 
Actual r on Estimated r Materials 
Short Form on Long Form 
Adjusted Score...| .28( .14—.41) 31 31 
Unadjusted Score.| .30 ( . .16—.48) .32 .30 
S-Score... ...... .25( .10-.41) 27 .30 
F-Score.......... .02 (— .14-. 18) .02 — .01 
A-Score.......... .23 ( .08-.37) 25 24 
TLC-Score....... .67 ( .58-.75) ef 85 
WT-Score....... .20( .05-.34) .22 .22 
TLA-Score....... .29( .15-.42) 31 .03 
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same relationship will be if the test scores are made more reliable 
by increasing the length of the test a given amount. Taking the 
power of comprehension score as the criterion, this formula was 
applied to the cross-form relationships to estimate the relation- 
ships for the whole test. In Table VI are presented (a) the cross- 
form relationships, (b) the estimates made by means of the Kelly 
formula, and (c) the actual whole test relationships. 

It will be noted that with the exception of the TLC-score, which 
is not strictly a measure of reading rate, the relationships esti- 
mated by the Kelly formula are very close to the corresponding 
actual relationships; hence, the reductions in the correlations 
between rate and power of comprehension scores derived from 
different but equivalent materials are no greater than can be 
accounted for entirely by the lower reliabilities of the scores. 
Therefore, it appears that the higher correlations obtained for 
identical materials were not due to irrelevant factors common to 
both rate and power of comprehension scores, and that the rela- 
tionship between reading rate and comprehension, as defined in 
this study, is the same whether the materials used to derive the 
scores are identical or equivalent. 


DIFFERENCES IN THE READING BEHAVIOR OF GOOD AND POOR 
COMPREHENDERS 


Anderson,? in studying eye-movements, reports that good 
readers slow down their absolute reading rate as the material 
increases in difficulty but that poor readers continue reading at 
approximately the same absolute rate regardless of differences in 
the difficulty of the material. If this is the case, and assuming 
that the exercises failed are—for the given individual, at least— 
the most difficult, it follows that, where the whole group is con- 
sidered, the good readers should make lower and the poor readers 
higher relative rate scores on these exercises than they did on the 
easier ones which they solved correctly. 

To check on this possibility, the mean of the differences 
between the S-scores and the F-scores for the forty-five best com- 
prehenders in the special group of three hundred was obtained. 
This mean difference was 4.9 and was statistically significant. * 





* The standard procedure for testing the significance of differences between 
pairs of related measures was employed (1', p. 58f). ¢ = 4.379 (to: = 2.576). 
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On the other hand, in the case of the forty-five poorest compre- 
henders the mean difference was —4.8.* This mean difference 
was also statistically significant. f 


TaBLE VII.—MEANS oF Raw TIME ScoreEs* oF GooD AND Poor 
COMPREHENDERS ON EAsy AND DIFFICULT MATERIALS 
Mean Mean on 


On Easy Difficult 


Class of Readers Exercises Exercises 
Good Comprehenders.................. 8.3 10.1 
Poor Comprehenders................... 9.6 9.4 


* One unit equals ten seconds. 


As a further check on Anderson’s findings, the raw time scores 
on each of six difficult and six easy exercisest were determined 
for these same groups of the forty-five best and forty-five poorest 
comprehenders. The means of these raw time scores for each 
group on each class of exercises are presented in Table VII. To 
determine whether the interaction between these means was 


TABLE VIII.—ANALYSIS OF VARIANCE OF Raw TIME SCORES 
MaApDE By Goop AND Poor COMPREHENDERS ON EASY AND 
DIFFICULT EXERCISES 


Sources of Variation d.f. Sums of Squares Variances 
Type of Exercise........... 1 4.67 4.67 
Class of Reader............ 1 28.01 28.01 
EE Pee aera 1 46.01 46.01 
Within Groups............. 176 538 . 27 3.06 

Shah Os SEU Shs 0d 179 616.96 


statistically significant, an analysis of variance was carried out, 
and the hypothesis of ‘no interaction’ was tested. It was found 





* This negative difference does not mean that the poor comprehenders 
actually read the exercises they missed more rapidly in an absolute sense. 
The success T-scores and failure T-scores are relative measures. This nega- 
tive difference indicates only that the poor readers tend to make better rela- 
tive time scores on the exercises they miss. These better relative time scores 
may be due more to the slowing down of the good readers than to any speed- 
ing up of the poor readers. For information on the effect of difficulty upon 
the absolute rates of good and poor readers see Table VII. 

tt = 5.040 (to. = 2.576). 

t The exercises were selected on the basis of difficulty indices determined 
in connection with the construction of the test. 
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this hypothesis could be rejected beyond the one per cent level 
of confidence. The results of this analysis are presented in Table 
VIII. 

Thus the findings of this study strongly support those reported 
by Anderson. The fact that the mean of the absolute time scores 
on the forty-five best comprehenders shifts from above to below 
that of the forty-five poorest as the materials shift from easy to 
difficult is striking evidence of how marked the difference in the 
reading behavior of good and poor comprehenders actually is. 


THE HYPOTHESIS OF A GENERAL RELATIVE READING RATE 


In computing the adjusted and unadjusted rate scores, it was 
assumed that a given individual tends to maintain approximately 
the same relative reading rate in a given group from one selection 
to another despite differences such as are usually found in the 
content and difficulty of the individual selections in reading com- 
prehension tests. This is equivalent to assuming that the indi- 
vidual’s relative rate scores on the separate exercises are quite 
closely concentrated about their mean. 

To check on the extent to which an individual’s relative rate 
scores on individual exercises tend to concentrate about their 
mean a simple analysis of variance was effected, and the estimated 
between-individual’s variance compared with the estimated 
within-individual’s variance. The results of this analysis, which 
are presented in Table 1X, show that despite differences within 


TABLE [X.—ANALYSIS OF VARIANCE OF ADJUSTED RATE SCORES 
MADE By INDIVIDUAL SUBJECTS ON INDIVIDUAL EXERCISES 


Scores of Variation df Sums of Squares Variances 
Between-Individuals....... 299 299306. 14 1001.02 
Within-Individuals........ 7081 341321.19 48 .20 

_ A aa > habe rare 7380 640627 .33 


F = 20.77, Fo. < 1.28. 


the same individual from exercise to exercise the test discrimi- 
nates between individuals. Hence, even if an individual’s true 
adjusted rate scores are not identical from exercise to exercise, 
such differences plus differences due to errors in measurement are 
not sufficiently marked to destroy the effectiveness of the rate 


score. 
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A further check on the hypothesis of a general relative reading 
rate has recently been made by Helms.’ Helms selected the 
fourteen easiest and fourteen most difficult exercises from the 
experimental test here described.* He obtained for each of three 
hundred subjects a composite adjusted rate score on the fourteen 
easy exercises by averaging their adjusted rate scores on those 
exercises to which they responded correctly and converting these 
averages into McCall T-scores. In a like manner a composite 
adjusted rate score was derived for each subject from the fourteen 
difficult exercises. The correlation between these two sets of 
scores was .74 (.66-.80). This coefficient is as high as some which 
have been reported between equivalent forms of the same rate 
test and hence constitutes additional evidence in support of the 
hypothesis, at least within the range of difficulty represented by 
the particular fourteen easy and fourteen difficult exercises. f 


THE EFFECT OF THE DIFFICULTY OF THE MATERIAL UPON THE 
RATE-COMPREHENSION RELATIONSHIP 


In the study just referred to, Helms also derived power of com- 
prehension scores for the three hundred subjects from the fourteen 
easy exercises by converting the number right into McCall T- 
scores. In a like manner he obtained power of comprehension 
scores on the fourteen difficult exercises. The correlation 
between the rate and comprehension scores derived from the 
easy exercises was .29 (.15-.42), whereas for the difficult exercises 
this relationship was found to be .31 (.17-.44). 

This result is contrary to that reported by Tinker'® who found 
that as the materials increased in difficulty the relationship 
between rate and comprehension decreased. However, Tinker 
used as a measure of rate the number of exercises attempted 
(right and wrong) in a limited time. As the difficulty of the 
materials increases the factor of time spent in unsuccessful reading 
obviously becomes an increasing element of such a rate measure. 
Since rate scores based on time spent in unsuccessful reading bear 
no relationship to power of comprehension, the trend which 





* The exercises were selected on the basis of difficulty indices determined 
in connection with the construction of the test. 

+ The average difficulty of the fourteen easy exercises was .75 (i.e., on the 
average 75 per cent of the subjects responded correctly to the easy exercises). 
The average difficulty of the fourteen difficult exercises was .52. 
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Tinker reports is precisely what would be expected when such a 
rate measure is employed. 


SUMMARY 


The purposes of this study were to develop a test for the meas- 
urement of rate of comprehension of reading, and to employ this 
test to study the relationship between rate of comprehension and 
power of comprehension. 

The unique characteristics of the test developed are (a) the 
control of reading purpose, (b) the use of only the time spent in 
comprehending at a certain level in deriving the rate score, and 
(c) the technique of combining a set of relative rate scores to 
obtain a composite rate score (comparable similar composites 
based on different sets of selections). 

The test consisted of a number of exercises varying in content 
and difficulty. Each exercise was composed of (a) a specific 
question, (b) a reading selection containing the answer to this 
question, and (c) several suggested answers to the question, one of 
which was definitely better than any of the others. Reading 
purposes was controlled by directing the subjects first to read the 
question and then to read the selection for the express purpose of 
finding the answer to the question. The subjects were further 
directed to read at that rate which seemed to them personally 
the most efficient for the accomplishment of this purpose. Each 
subject indicated his choice of the best answer to each question, 
and also recorded the time of beginning and the time of finishing 
each exercise. The subjects were fully aware that they were 
timing themselves on each exercise. This awareness was deemed 
sufficient to prevent them from spending time in excess of that 
needed to accomplish the purpose set. From the data thus 
obtained it was possible to derive a measure of an individual’s 
rate of reading based only upon reading exercises which he was 
known to have comprehended at a set level. Other measures of 
‘reading’ rate based on time spent in both successful and unsuc- 
cessful reading were obtained from this same instrument, as well 
as a measure based only upon the time spent on exercises failed. 
A measure of power to comprehend was also derived from this 
same test. The test was administered to 672 eleventh- and 
twelfth-grade pupils in four middle-sized Iowa high schools. The 
principal findings follow: 
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1) The relationship between rate of reading comprehension 
and power of reading comprehension is significant but low, the 
correlation (within-grades within-schools) being approximately 
.30. 

2) An individual tends to maintain approximately the same 
rank in rate of successful reading in a given group despite dif- 
ferences such as are usually found in the difficulty and nature of 
the individual selections used in reading comprehension tests. 

3) Good comprehenders adjust their rate of reading by slowing 
down as the material increases in difficulty, whereas poor com- 
prehenders apparently read easy and difficult materials at much 
the same rate. 

4) Significant differences were found between measures of read- 
ing rate based only upon materials comprehended and measures 
of reading rate based upon materials not comprehended or upon 
a mixture of comprehended and uncomprehended materials. No 
relationship was found between rate based on uncomprehended 
materials and power of reading comprehension. 

5) When the experimental reading rate test here described is 
used as a criterion, the validities of certain existing rate tests* are 
low (range .49 to .65). Rate scores which in part measure com- 
prehension are poor measures of reading rate. 
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TRANSFER IN CHILDREN’S MAZE LEARNING 
H. E. JONES AND M. BATALLA 


University of California 


The experiments described below were designed to test the 
presence of transfer of training from one maze pattern to another, 
where the patterns were similar but were incorporated in different 
situations. Two mazes, a stylus maze and a ‘macro-maze’ 
(life-size alley maze), were employed with two groups of subjects 
in the following balanced order: 


Group I Group II 
Order 1 Stylus maze Order 1 Macro-maze 
Order 2 Macro-maze Order 2 Stylus-maze 


EXPERIMENTAL SAMPLE 


The sample, consisting of seventy-three boys and girls 
averaging twelve years of age, was drawn from an urban school 
population in the high sixth and low seventh grades.* All 
of the subjects had had previous experience with learning experi- 
ments and other laboratory situations at the Institute of Child 
Welfare. Observational records indicate a generally high level 
of interest and motivation. 

The subjects were divided at random in two groups, with 
sampling characteristics as shown in Table I: 


TABLE I.—CHARACTERISTICS OF THE SAMPLE 











CA in Years MA (Terman Group 
. Test) 
No. 
Range | Mean|SD/} Range | Mean}; SD 
Group I......| 37 |11.1-13.0} 12.1 | .46)9.2-15.2) 12.7 |1.24 
Group II.....} 36 |11.1-13.3) 12.1 | .46)9.6—-14.4) 12.8 |1.05 


























THE LEARNING TASKS 


The maze pattern adopted for this experiment was one which 
our subjects could solve, on the average, in about twelve trials 





* For reports of maze transfer experiments with adults, see references 2, 
5, 6. 
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(by the criterion of two consecutive errorless runs). A problem 
of this degree of difficulty was judged appropriate for a transfer 
experiment, since an easier problem might too readily invite 
the use of simple verbal solutions, and a more difficult problem 
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Fig. 1.—The Maze Pattern. 


would extend beyond the time suitable for an experimental 
session with school children. 

The life-size maze was constructed outdoors on a wooden 
platform fifteen feet square. It was divided into twenty-five 
cells each three feet square, with walls six feet high. These 
walls were composed either of wood or of heavy denim curtains, 
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the latter being used at the entrances of blind alleys, and also to 
separate adjoining cells on the correct path. In traversing the 
maze, the subject could at no time see more than three feet ahead, 
and must lift a curtain before moving from one cell to another. 

The stylus maze, eighteen inches square, resembled a large 
‘waffle’ with grooves intersecting at right angles. When a 
stylus was introduced into the maze at the entrance, it could 
not be removed until the exit was found. As seen from above, 
the grooves or slots appeared to be continuous, with no blind 
alleys; by means of concealed stops, however, it was possible to 
set up a true-path pattern identical with that of the macro-maze.* 

This pattern is shown in Figure 1, the double line indicating 
the course of the true path, and the double broken line indicating 
the position of the blind alleys in the two mazes. Since the stylus 
maze could be traversed more quickly than the life-size maze, 
it was provided with an additional number of blind alleys in 
order to make the learning periods on the two mazes more nearly 
equivalent; these additional culs, present only in the stylus maze, 
are shown in Figure 1 by single dotted lines. 


EXPERIMENTAL PROCEDURE 


Each of the thirty-seven subjects in Group I, taking the stylus 
maze first, received the following instructions: 


“This is a little maze that has many pathways . . . some 
right and some wrong. You are to take this stylus, go in at 
this front door, and push it through the pathways until you 
come out over here. . . . Remember that there is one quick 
way through the maze and there are also some blind alleys. 
If you go into a blind alley you have to back out again. Try 
to find your way as quickly as you can, without going into 
blind alleys. After you have learned this maze, I am going 
to take you outdoors to another, bigger maze that has path- 
ways much like this. Try to learn this as quickly as you 
can, and remember what you have learned.”’ 





* A similar type of slot maze is described in greater detail by Jones and 
Yoshioka.’ 
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Each of the thirty-six subjects in Group II, taking the life-size 
maze first, was instructed: 


“T am going to let you go in this little house, and you 
must find your way out as quickly as you can. Here is the 
front door that you go in, and here is the door where you 
come out. There is a quick way through the maze, and 
there are also some blind alleys. If you go into them you 
have to turn around and come out again. Try to find your 
was as quickly as you can, without going into blind alleys. 
After you have learned this maze, I am going to take you 
indoors and let you work with a little maze that has path- 
ways much like this. Try to learn this as quickly as you 
can, and remember what you have learned.”’ 


In undertaking the second part of the experiment, each subject 
was given instructions appropriate to the maze on which he was 
now working, and was also reminded that the correct pathway 
was similar to that on the maze which he had already learned. 
Errors, time, and trials were recorded. In the case of the life-size 
maze, the details of performance were recorded by one-way 
observation from above the maze. 


RESULTS 


Table II presents results for each part of the experiment, in 
terms of trials and error scores. 

Although Groups I and II are equal in their average mental 
test scores, we have no assurance that they are similarly equated 
in maze learning abilities. Therefore, in examining the data 
for possible transfer effects it will be necessary to average the 
records for the two groups, on Order 1 as compared with Order 2, 
in this way balancing out possible effects of sampling differences. 


By this method we can conclude as to ‘average’ transfer 
effects, but cannot reach any conclusion as to possible differen- 
tial effects of training on the two mazes taken separately. 
It might be expected that if positive transfer were to occur 
at all, it would operate ‘from’ the stylus maze, for in learning 
this maze the subject maintains a constant orientation toward 
it and has every opportunity to achieve a clear motor-per- 
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ceptual impression of the correct path; upon tracing the 
correct path he can immediately review, visually, the course 
which he has taken. In the large maze, no such possibility 
exists for reviewing the pattern as a whole. Actually, as 
shown in Table II, no reliable differences occur between the 
two orders, but a tendency is apparent for poorer scores to be 
obtained after the stylus maze training, and for better scores 


TABLE II.—STATISTICAL CONSTANTS FOR EACH PART OF THE 
EXPERIMENT 





Order 1 Order 2 





Mean SD Mean SD 











Stylus Maze Group I Group II 
Ns hc bows Ceauves 12.4 6.8 9.3 6.7 
rere i 33.3 28.7 23.8 

Macro-maze Group IT Group I 
Pe snee > bbanee eee 12.2 5.8 14.2 7.2 
rere ek 18.4 41.3 24.4 











to be obtained after the training on the large maze. These 
obtained differences are probably to be attributed to measure- 
ment and sampling errors, since the writers see no basis tor 
hypothesizing opposed types of transfer (negative and positive 
in the two instances). 


Table III presents the comparison between the two orders, 
with possible differences between groups, and between mazes, 
balanced out. It is apparent that the two sets of measurements 
are substantially identical.* Average time scores were also 
nearly the same (4’ 17” for Order 1, 4’ 12” for Order 2). 





*In computing SD’s for combined distributions, it seemed desirable to 
minimize the effects of differences between the mean scores for different 
mazes. Accordingly, for averaging the SD’s in Table 2, for a given order, 
the formula (, p. 120) used was 


N 6,2? + N26; 
(Ni — 1) + (Ne — 1) 


O42 = 
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Fig. 2.—Learning Curve for the Training Series (Order 1) and Transfer Series 
(Order 2). 
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The absence of transfer effects is also shown in Figure 2, which 
presents learning curves in terms of average errors and average 
time per segment of learning, when each series is divided into ten 
Vincent segments. !* 


TABLE III.—STATISTICAL CONSTANTS FOR EACH ORDER 





Order 1 Order 2 
(Training series) | (Transfer series) 





Mean SD Mean SD 




















is fe oe ie 12.3 6.3 11.7 78 
BS ins x ue. en 38.6 27.4 35.0 24.4 
DISCUSSION 


Woodworth has pointed out that “trial and error behavior . . . 
is directed toward a goal but is not controlled by any explicit 
perception of the relationships involved” ('%, p. 747). An ade- 
quate perception of relationships may emerge in the gradual 
course of learning but it is significant that in the present experi- 
ment the process of discovery was as variably exploratory on the 
second as on the first maze, with no reduction in the necessity of 
‘trying this-and-that lead to the goal’’ (4, p. 272). 

This absence of transfer, when two similar maze patterns are 
learned in immediate succession, must be regarded as evidence for 
the specific nature of maze learning. Although each maze is 
mastered within a small number of trials, and although the sub- 
jects understand that the same true path is involved in the two 
mazes, the average results indicate little or no transposition from 
one maze to the other. The learning curves for the two halves 
of the experiment represent what might be expected from two 
patterns entirely different but of similar difficulty. 

Our subjects have obviously made little use of ideational proc- 
esses* carried by a verbal formula appropriate to the two mazes. 
They have acquired little or no grasp of the ‘field properties’ of 
one maze, which would permit them to transfer an understanding 





*For recent studies of the relation of transfer to understanding and 
explicit generalization, see, for example, references 8, 9. 
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of its configuration to the other maze having a similar pattern. 
It may be that they see each maze as a ‘structural whole,’ but if 
so their whole responses are functionally useless in enabling them 
to adapt to the common pattern; in such a case, some question 
may be raised as to the precise operational meaning of the term 
‘structural whole.’ On the other hand, the results are quite 
compatible with an interpretation of maze learning as involving 
a summation of specific trial-and-error achievements repeated 
de novo, without insight, in each new motor-perceptual context. 
We may regard this, in Woodworth’s phase, as “place learning 
. closely bound to the actual place.” 


A POSSIBLE INFERENCE FOR EDUCATION 


It has been observed that ‘‘one object in borrowing the maze 
from the animal laboratory is to see how well human subjects 
perform a task which even rats do very well. Another object is 
to diversify the material used in the study of human learning”’ 
(3, p.141). The writers would be the last to urge that the behav- 
ior of a twelve-year old child in a maze (or, much less, the 
behavior of a rat in a maze) has implications broadly useful for 
education. Such implications, however, have been rather freely 
suggested in some of the previous work in this rather limited field. 
In particular, it would appear that Gestalt formulations based on 
maze learning may have led us to be too adultomorphic concern- 
ing the nature and generalizability of children’s learning. When 
maze learning is interpreted in terms of its ‘judgmental charac- 
ter’ and the ‘abstracting’ of the goodness of certain responses,'! 
and when the white rat is described as attacking his problems 
with ‘hypotheses’ and ‘insight,’ we are tempted to hope that 
sixth-grade children will show at least as great ingenuity. If, 
however, in learning a maze a subject fails to acquire anything at 
all that he can transpose to the learning of another maze of the 
same true path, the question may well be raised as to whether the 
original learning took place at the level predicated. 

As Brownell! has pointed out, learning can be generalized, ‘‘The 
possibility and the desirability of transfer cannot be questioned. 
The problem then becomes one of so organizing the materials and 
methods of instruction to guarantee the largest possible amount 
of positive transfer.’”” Those who have emphasized the insightful 
and purposive aspects of learning seem to have provided a wish- 
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fulfilment for some educators who are glad to take a maxima] 
view of education. It may be necessary to remind such hopeful 
persons, that if a teacher is to be in a sound position to make 
learning more broadly effective he must estimate accurately 
rather than overestimate the learning capacities and procedures 
of his pupils. In fairness, the converse should also be noted— 
that teachers who regard all tasks of learning as rote mechanical 
processes will tend to equip children with non-transferable 
specifics, and will fail to offer the further essential step of helping 
them to relate and organize what they have learned. 


SUMMARY 


1) A life-size alley maze and a stylus maze were constructed 
with the same true-path pattern adapted to a large and a small 
scale. 

2) Seventy-three sixth- and seventh-grade pupils were divided 
into two groups and tested on the two mazes in a balanced order. 
Before learning the first maze, they were instructed that they 
would be required to learn a second maze, different in size but 
with similar pathways. 

3) Average learning curves were practically identical in the 
training and transfer parts of the experiment. Each maze was 
learned in from nine to fourteen trials on the average, but with no 
demonstrable transfer effect. 

4) In addition to showing the specific nature of maze learning, 
the results were regarded as compatible with an interpretation of 
such learning as summative rather than configural, and as 
relatively mechanical rather than insightful. 
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ON THE ESTIMATION OF THE CHANGES 

IN CORRELATION AND REGRESSION CONSTANTS 

DUE TO SELECTION ON A SINGLE 
GIVEN VARIABLE 


HUBERT E. BROGDEN 
Personnel Research Subsection, The Adjutant General’s Office 


Investigators concerned with the prediction of success in 
schools, training programs, or business and industry frequently 
find that they must limit themselves to a population that has 
been selected as above a given value on a test or some other 
variable. The problem of determining the correlation between 
a predictor and a criterion in the unselected population of those 
accepted is often very important. The present paper is con- 
cerned with the solution to this and similar problems. 

Since the problems discussed herein will be limited primarily 
to those in which selection has occurred on only one variable, 
a clear statement of the manner in which the word ‘selection’ 
will be employed is essential. In the following, selection on zx 
(and only z) will mean alteration of the distribution of x without 
alteration of the distributions of any other variable (except for 
changes due to sampling error) within any population having a 
constant x value. 

It will be convenient to designate the variable on which 
selection occurs by the letter xz, to indicate constants in the 
population from which the estimates are made by the subscript 
(1) and constants in the population for which the estimates are 
made by the subscript (2). Populations (1) and (2) differ only 
because of differential selection on z. The letter k refers to the 
ratio oz,/¢z,. 

This definition of selection has definite implications for assump- 
tions made in deriving linear regression equations. If linear 
regression of y on z, or homoscedasticity, or equal correlation of, 
say, y and z within constant values of z is assumed, it follows 
that, except for sampling error, By, = Bzy,, Gy.z, = Ty-z,, and 
Tye.2, = 27ye2, That is, if such constants are equal for all 
populations having a given constant z value they are equal for 
all summations over varying z values, so long as selection has 
occurred on z only. 

484 
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Expressions for M,,, oy,, Tzy,, and ry., will be derived (where y 
and z are any variables other than z). Thus 
z*y 


(1) M, = NV 


Substituting 9 + (y — g) for y and remembering that (y — 9) 
adds to zero for any constant value of z if the regression of y 
on z is linear and if selection has occurred on z only 


i 
(2) M, = 5+ 


Substituting from the regression equation of y on zx and 
remembering that regression constants are unaffected by selection 
on xz we have 


(3) y _ Bay, “=z + M,, a Bry, M 2z,. 
Substituting for 9 in (2) 





(4) M,, = Px. Zt + My, — Bey,Mz, 
or 
(5) M,,t = Bry, (M;, _ M,,) + M,, 


is obtained. 
In deriving an expression for oy,, [9 + (y — 9)] will be sub- 
stituted for y, and [8.,,(z — M.,) + M,,] for 97. The formula 











(6) oF = ae — M?,, 

becomes 

) Ay = Male = Melt Mel ta _ 
Expanding, summating, reducing, (7) becomes 

(8) o7,, = Bray, E = me, 4 as ae 9)? 





*Summations are over population (2) unless specific indication to the 
contrary is given. 

t Those familiar with the derivation of the bi-serial correlation coefficient 
will recognize (5) as an equation from which the bi-serial can be easily 
obtained. 











486 The Journal of Educational Psychology 


since 2(y — g) is zero for constant values of zx. The term 


=(y — 7G)? ' , 
Be y) equals o%,(1 — r?,,), which in turn equals 





o*,,(1 — r*.y,) with the assumption of homoscedasticity. 
Hence, remembering that B.,, = Bzy, 


(9) o*,, 8 B? xy, 2, + o%,,(1 ery Tr y)s 
(10) oy, = [r72y,07y,k? + oy, (1 a4 r3,,.)]”, 
and 

(11) oy, = oy,{1 a (1 - k*)r*,,,)*. 


In order to determine r,,, an expression for =ry/N — M,,M,, 
must be derived. Substituting 7 + (y — 9) for y and remember- 
ing again that (y — g) adds to zero for any constant value of z 


we have 
2xry 


(12) ay - M.M,= 3! - M.My. 


Substituting 8.,,7 + My, — Bz,M:, for 7 we obtain 
2 
(13) ay _ M,.,M,, - Poa + 2 (M,, ~s Bry.M 2.) = M,,M,,. 
or 


2 
(14) =Y — MaMy, = 9 + MeMy — Bev M 2a — MeMyy 





Reducing (14) and substituting 6.,, for B.,, we obtain 


D> 2x? 
(15) =e = M,,M,, = Buy, (32 a M*,,). 


This reduces to 





(16) ay ee a ee Y 
Substituting from (11) in the formula 
NT MM 
(17) Ty = — we obtain 
(18) Tey,* = Tey,k[1 — (1 — k*)r?sy,J-* 





* This formula has been derived by Kelly. 
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An expression for estimating the correlation between two 
variables y and z in population (2) when (1) and (2) differ only 
because of selection on z may be obtained as follows: If 9 + 
(y — 9) and 2 + (z — 2) are substituted, respectively, for y and z 


Zyz _ 2g + y — ile + @ — 4) 














(19) N N 
which expands to 

Syz _ gz , TH(z— 2) , Ty — 9) 
(20) “N- M,.M:, — N + N + N 


Z(z — 2)(y — 9) 
+ N - M,, M,. 





But 2g(z — Z) and L2z(y — ¥) vanish since g and Z are con- 
stants for any constant value of x while 2(y — 9) and X(z — 2) 
add to zero for any constant value of x. Substituting 8.,,7 + 
My, — Bry.M:z, for 7 and B.y,7 + M,., — B.2,M., for Z we obtain 


2 
(21) — /- M,,M,, = Bane Bzz, + Aofmts om Beer PonZ 


N N 
+ Peet? M., + MyM — BarMaMe, — 9 BM 


— My Br2,M., + Bey,M2B22,.M:z, — My,M:z, 
Z(z — 2)(y — 9): 




















+ N 
This reduces to 
2 Sx? Z(z—2)(y-9 
(22) a ts M,,M.,, -_ Bry Brz: (7% = M,) + C Ne 9). 


But ryn2 = D(y — 9)(2 — 2) + (Noy2o22)—! and 


2(y - Xs - 2) = Yana h a. r2.4]*o,[1 = r?..)"04 


Or (Tys — Tzylzz)oyo:, If the assumption is made that ry..z, 
Oy.2, and go; are the same for all constant values of z, it 
follows that (rye — rzy%z:)oyo: is unaffected by selection on z and 
further that (rye, — Tey,T22,)Fy,F2, = (Tye — Try. r2,)Fy.0e, If, in 
addition, we assume again that 8.,, and 8, are equal to 8.y, and 
Bzs,, we obtain after substituting and reducing 





(23) ue = M,.M,, — [Tye _ (1 -: ke?) rey:Penloy,02:- 
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From (11) and (23) 


(24) Ty, = [Ty2, 7 (1 ing k*)rey,7 22,1] ; (1 7, (1 7 k?)r22yJ-” F 
[i — (1 — &)r*.. J". 


The estimation of the frequency distribution of y in population 
(2) may often be desired. An f(y2) value may be determined from 
a bivariate frequency distribution of x and y values in population 
(1) so long as (1) is not a truncated portion of (2). In such a 
problem selection on x would be equivalent to the multiplication 
of each entry of the columns of the bivariate frequency dis- 
tribution by weights, that is, doubling all entries in one column, 
halving the entries in another column, ete. This follows directly 
from the definition of selection on zx since this definition precludes 
changes in the frequency distribution of y (other than those due 
to sampling) for a constant value of x. Hence, f(y2) values may 
be obtained by weighting cell entries of columns of the bivariate 
distribution in population (1) so that the column sums of f(2;) 
values are equal to corresponding f(x2) values and adding cell 
entries of rows of the weighted distribution to obtain the esti- 
mated f(y2) values. 

If, as in the example cited in the first paragraph of this paper, 
cell entries were missing but column sums are known for columns 
below a certain x value, the problem of estimating the missing 
entries from knowledge of the column sums would arise. It is 
assumed, in other words, that x2 values for both populations are 
known but that y values are known only for population (1). 
However, since oy,, Tzy,, and M,, may be estimated, the mean 
and o of x values of an array having a constant y value can, 
consequently, also be determined. If the z values within this 
array are assumed to be distributed normally (or according to 
some empirically determined function) the proportion of cases 
to the point of truncation may then be directly computed from 
the frequency function if the point of truncation is expressed 
as a standard score of x values within that given array. The 
number of cases in the complete population may be determined 
by dividing this number by the proportion of cases in the trun- 
cated population having the given y value. 

It should be evident to the reader that both Beta coefficients 
can be readily determined from formula already developed. 
One of the basic assumptions in developing all of these formulae 
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is that the regression of y on z is linear. From the definition of 
selection here employed it follows that selection on z will not 
alter this regression equation. Hence £,,, may be equated 
directly to B.y, Byz, may be computed after My,y,, o,,, and 
rz, are determined from, respectively, formulae (5), (11), and 
(18). It is interesting in this connection that, although altera- 
tion of a scatterplot of x and y by selection on x does not affect 
the slope or linearity of the regression of y on z, the regression of 
x on y becomes non-linear with selection on z. This is still true 
in the case of regression of standard scores or, in other words, for 
correlation surfaces. However, this point is possibly academic 
in that generally the departure from linearity must be consider- 
able to have practical significance. 


DISCUSSION 


Apart from the definition of selection, which does definitely 
limit the problems to which these formulas may be applied, all 
of the assumptions involved in the derivation of these formulae 
are those normally made in partial and multiple regression 
problems. When assumptions of linear regression of y on 2, of 
homoscedasticity, or of equal r,.’s for constant values of z are 
not exactly met, the estimated values still provide, as in regression 
problems, an approximation which will suffice for most purposes. 
However, the exactness of the approximation varies considerably 
for different applications of these formulae. In the case of (18) 
and (24), wherein k is less than 1, such approximations are similar 
to those obtained with corresponding regression equations. 
Likewise, when k is over 1 but difference is due largely to selection 
on x without truncation, such approximations are probably 
adequate. When k is over 1 and truncation has occurred on z 
the situation is different. Even though a departure from the 
assumption of linear regression of y on x reduces negligibly the 
accuracy of prediction over the total range, it may well be that 
extrapolation of the regression line from a narrow range at one 
extreme of the z distribution to the regression line over the entire 
distribution may result in considerable error. Similar con- 
siderations are pertinent with reference to the effect of sam- 
pling error upon the slope of the regression line in a truncated 
distribution. 
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Certain special applications may be suggested for a number of 
these formulae. Since formula (5) has appeared in print a 
number of times and is well known to statisticians it scarcely 
seems worth while to comment on its application. 

Formula (11) is not well known so far as the author is aware. 
Note that (11) is in a sense a generalization of the formula for 
the standard error of estimate (or that for the partial standard 
deviation) and reduces to it in the special case wherein k equals 0. 
It might also be mentioned here that unless the correlation 
involved is rather high, the effect of the correction provided by 
(14) on the standard deviation is not very great. 

Formula (18), while developed and published by Kelley,! 
has apparently rarely been employed, if one is to judge from 
the content of current textbooks. There are several possible 
applications of this formula which may be of interest to the 
reader. One such application has been mentioned in the intro- 
duction. Suppose though that selection were intentionally 
made on x—as in the choice of a sample with a distribution 
rectilinear on x or in use of only the upper and lower quarters 
of a population. If, in such instances, an estimate of r., were 
desired for the complete population (18) would provide it. It 
should be emphasized that in applying (18) to the estimation of 
‘true’ correlation from that obtained in widespread classes, cz, 
is computed from the complete z distribution (x is employed, as 
throughout this article, to indicate the variable on which selection 
occurs), while r,,, and oz, are computed on all cases in the 
‘mutilated’ population; that is, there is no separate computation 
of these constants for the tails of the distribution. By the same 
reasoning (18) may be employed to determine the correlation 
that would be obtained if selection were made on a given variable 
so as to normalize a distribution. Note though that it is the 

f(x), not the x values that are normalized. This is not the same 
as employing a multiplying factor on the variable z. 

Formula (24) may be regarded as a semi-partial correlation 
coefficient or as a generalized form of the partial correlation 
coefficient in the same sense as (11) is a generalized form of the 
standard error of estimate. When k is zero (24) reduces to the 
formula for ryz.2z. In considering (11) and (24) in this manner, 
the formulae are employed to estimate from values in a given 
range on z to those within a sub-portion of the distribution. 
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This is, of course, the opposite application from that initially 
considered wherein estimates of the larger population were deter- 
mined from values computed within the sub-portion. It should 
be emphasized that both applications are legitimate. 

A number of formulae have been erroneously employed to 
solve problems similar to those for which (18) and (24) were 
developed. Thus 


(25) oy V1 al roy, = Fy; V1 wes Toys 


derived by Kelley,! has been employed rather frequently in 
estimating the correlation in a complete population from that 
computed in a truncated portion and for the solution of various 
other similar problems. 

Consider again the example previously cited; namely, that in 
which a population of applicants are administered a test z. 
Here, a portion having test scores equal to or greater than the 
critical score are accepted and assigned criterion (y) values. 
The limited usefulness of the formula (25) in such situations 
should be immediately evident. Both r.,, and o,, would be 
unknown in the example under consideration. If y (criterion 
values) were known in the complete population, r.,, would be 
directly computed. At best, then, use of this equation can only 
result in saving in computation labor. 

The assumption that ¢,, +/1 — r?,, is equal to oz, /1 — rx, 
has often been made in attempting solution to such a problem. 
That this assumption is untenable in the above example may be 
readily shown by considering the case wherein r.,, and Trzy, 
are zero. Here the equation reduces to oz, = ¢z,, which is 
ridiculous since z is the variable on which truncation occurred. 

Very frequently (25) has been employed to correct for differ- 
ences in heterogeneity between two populations when the selec- 
tive factors producing this difference are unknown. Formula 
(24) was developed for the solution of such problems when the 
selective factor is a known third variable. Substitution of .8 
for rzz,, zero for rzy,, .6 for r,z,, and .3 for k would result in a 
corrected coefficient of .92 (if k were zero the correlation—a 
partial—would be 1.00). The phenomenon is analogous to that 
resulting from action of ‘suppressor’ variables in multiple correla- 
tion analysis. This example should suffice to demonstrate that 
any formula which estimates the correlation between two 





ore 
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variables as lower when the standard deviation of one of them is 
lowered because of unknown selective devices, is erroneous. 

It should be emphasized, however, that none of these criticisms 
apply to use of Kelley’s! formula for correcting reliability coeffi- 
cients for differences in heterogeneity 


(26) Cys V1 saa Ry, = Oy, V1 “— Ryy,) 


so long as selection has occurred on a variable other than that on 
which the reliability is being determined. The problem here 
differs markedly from those previously discussed in that 1 — R 
is error variance which may reasonably be supposed to be equal 
for constant ‘true’ scores, and to be uncorrelated with all selective 
factors that might possibly have produced the change in gy. 
Note though that (26) could not be applied if selection occurred 
directly on y, since selection on y would mean selection differen- 
tial with respect to y on the error factor of y which is assumed in 
(26) to be uninfluenced by selection. This follows from the 
fact that scores of persons high on y are known to contain 
positive errors while those of persons low on y are known to 
contain negative errors. 
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THE RELATIONSHIP BETWEEN SCORES 
ON A CLERICAL TEST 
AND CLERICAL PROFICIENCY IN LIBRARY WORK 


GRACE M. OBERHEIM 


Iowa State College 


The National Institute of Industrial Psychology Clerical Test 
(American Edition), hereafter called the Clerical Test, has been 
used as a device for predicting success of student assistants over 
a period of years at the Iowa State College Library, and the 
results of a study to determine the predictive value of Total 
Scores made on the Clerical Test with the criterion, the library 
rating, have been reported.§ 

The Clerical Test has been described in considerable detail 
elsewhere.*:*§ This test was chosen because it contained sub- 
tests sampling abilities similar to those needed for library work; 
for example, the ability to follow oral instructions; the ability to 
deal readily with numbers; the ability to file and copy accurately; 
and the ability to detect errors.* The tasks which student assist- 
ants perform are of a clerical nature. Guilford states that a test 
is a valid one for clerical aptitude if its scores correlate highly 
with later clerical proficiency.? If any or all of the subtests of 
the Clerical Test are valid measures of clerical ability such sub- 
tests should have a fairly high relationship with the criterion; 
and if any of the subtests measure clerical ability as distinguished 
from scholastic ability or ‘general intelligence,’ it would seem 
that such subtests might have a relatively higher correlation with 
the library rating than with the measures of academic success. 

The subjects of this study were sixty-nine college students who 
took the Clerical Test at the beginning of the Fall Quarter, 1939. 
A library rating was made for each assistant at the end of this 
quarter, and the grade was obtained from the Office of the 


Registrar. T 





* The subtests are as follows: I, Oral Instructions or Memory; II, Classi- 
fication; III, Arithmetic; IV, Copying; V, Checking for Errors; VI, Filing; 
and VII, Problems. 

t Amore complete outline of the plan of the original investigation is avail- 
able in a thesis submitted by the writer in partial fulfillment of the require- 
ments for the degree of Master of Science in the Faculty of Library Service, 
Columbia University, 1941, under the direction of Dr. Alice I. Bryan. 
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This article reports an effort to determine, from an analysis of 
scores made on the subtests of the Clerical Test, which of the 
subtests have the highest relationship to library work as measured 
by the library rating; whether or not scores on any of the sub- 
tests have a higher relationship to library work as measured by 
the rating than to academic achievement as measured by the 
college grade; and whether or not there are significant differences 
between scores made on the subtests by men and women. 

The statistical measures used in analyzing the data included a 
study of the significance of means for the two groups, men and 
women, and of the significance of correlation coefficients. Elabo- 
rate treatment of the data has not been attempted, but rather an 
effort has been made to present the data available with some 
discussion of the results of the study. 


PRESENTATION OF RESULTS 


In Table I the mean differences between scores for men and 
women on the subtests of the Clerical Test, the college grade, the 


TABLE I.—SIGNIFICANCE OF MEAN DIFFERENCES ON VARI- 


ABLES FOR MEN AND WOMEN 
Library Total 
Pc teuceus N I II III IV V VI VII Grade Rating Score 


eee 46 5.46 8.76 11.98 6.30 6.39 17.00 17.04 2.30 23.1 (72.72 
Women........ 23 6.65 11.70 11.91 9.48 9.09 18.30 17.17 2.19 23.83 84.30 
Diff. of Means..... 1.19 2.94 .07 3.18 2.70 1.30 .13 .11 .73 11.58 
SE of Diff......... 62 .74 1.53 1.03 .86 1.43 1.06 .18 1.54 4.51 
a i) ee 1.92 3.97% .05 3.097 3.14¢ 91 12 .61 .47 2.57% 


* indicates significance at the five-per-cent level. 
t indicates significance at the one-per-cent level. 


library rating and the Total Score of the Clerical Test are pre- 
sented. The women made higher scores on subtests II, IV, and 
V than the men. According to Snedecor’s Table 3.8, the mean 
differences on these tests are highly significant, or significant at 
the one-per-cent level.’ The women made significantly higher 
scores than the men on the Total Test, the differences being 
significant at the five-per-cent level. The mean differences 
between the two groups were not found to be significant for the 
other subtests, I, III, VI, and VII, or for the college grade or 
library rating. 

Product moment correlation coefficients between the subtests 
and the college grade, the library rating and the Total Score of the 
Clerical Test, as well as intercorrelations between ,the sub- 
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tests of the Clerical Test are shown in Table II for the men and 
in Table III for the women. Correlation coefficients which are 
significant according to Snedecor’s Table 7.2 are indicated.’ 


TaBLE I].—CoRRELATIONS OF SUBTESTS WITH GRADE, LIBRARY 
RatTiInG, ToTaL ScoRE AND INTERCORRELATIONS—MEN 
N — 46 
Library Total 
Item1l 2 3 4 5 6 7 Grade Rating Score 


l. .25 .42f .03 587 .30 .45T .33* A3T .62T 
2. 14 .19 .17 .41f .49f .27 a2") = TT 
3. Ol .26 .3897 .49f .43¢ .42f .69T 
4. —-.10 .09 .10 .1)1 13 .29* 
5. o° .21 1.27 .56F = .51T 
6. 46¢ .577 .51f = .79T 
7. 48t .44¢  .75t 
Grade .53f .61T 
Library Rating .66T 


* indicates significance at five-per-cent level; f at one-per-cent. 


TaBLE III.—CorRELATIONS OF SUBTESTS WITH GRADE, LIBRARY 
RaTInG, ToTaL SCORE AND INTERCORRELATIONS— WOMEN 
N — 23 

Li- 
brary Total 
12 3 4 5 6 7 Grade Rating Score 


1. .00 .438* .31 .26 .31 .26 .21 0 © .43f .64T 
2. .22 .24 —.22 .19 .56¢ .38 .44* .39 
3. 22 .23 .386 .85 .24 #.85~ .79T 
4. —-.08 .27 .02 .10 .17~~=«.50* 
5. —.26 —.08 .13 .06 .2 
6. 36 .16 .25 = .58T 
7. .45* .46* .64f 
Grade a2 @=—t«w. 420* 
Library Rating .54T 


* indicates significance at five-per-cent level; ft at one-per-cent. 


When tests are combined into a battery, the tests should meas- 
ure different aspects of the same general capacity. The ideal 
subtest is one which possesses general relationship to the battery 
of tests; which has high relationship with the criterion; and which 
has low and positive relationship with the other tests. If sub- 





496 The Journal of Educational Psychology 


tests have high intercorrelations they are likely to be duplicating 
one another in predicting the criterion. In addition a subtest 
should have a higher correlation with the Total Test, of which it 
is a part, than with the other subtests, as an indication that it is 
contributing its part to whatever the Total Test measures. 

If we use Garrett’s statement of the interpretation of the size 
of r, we find that in the relationship of the subtests to the college 
grade and to the library rating, the correlation coefficients in 
Tables II and III range from negligible relationship to substan- 
tial relationship. 


TABLE IV.—CoORRELATION COEFFICIENTS BETWEEN THE Sus- 
TESTS AND TOTAL SCORE OF THE CLERICAL TEST AND THE 
LIBRARY RATING 
(CORRECTED FOR ATTENTUATION) 


Total 

N. 1 2 3 4 5 6 7 Score 
ee 46 .497 .37* .487 .15 .647 .587 .50f .75T 
Women....... 23 .49* .50* .40* .19 .07 .28 .52f .627 


* indicates significance at the five-per-cent level. 

t indicates significance at the one-per-cent level. 

In Table II, for the men, we find that correlation coefficients 
on subtests I, III, V, VI, and VII are highly significant with the 
rating; Test II is significant, but not highly so; and Test IV is 
not significant. The correlation coefficients range from .13 to 
point .56 for the subtests with the rating. Test V has the highest 
correlation with the rating, .56, and Test VI is second highest 
with an r of .51. When the correction for attenuation is made in 
Table IV, these coefficients are raised to .64 and .58, respectively. 
Tests I, III, and VII have very much the same relationship to the 
rating as far as the size of r is concerned, (.43, .42, and .44, respec- 
tively). Test II has next to the lowest relationship with an r of 
.32 while Test VI has the lowest relationship with an r of .13. 

Correlation coefficients between subtests III, VI, and VII and 
the college grade are highly significant, while Test I is signifi- 
cant, but not highly so, and Tests II, IV, and V are not signifi- 
cant. Test VI has the highest relationship to the grade with an 
r of .57, with Test VII second with an r of .48. 

A comparison of the relationship of the subtests to the rating 
and to the college grade indicates that for the men in this group 
Tests II and V have significant relationship with the rating but 
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not with the college grade, and that Test I has highly significant 
relationship with the rating and significant relationship with the 
grade. For this group, these three subtests might be considered 
to have higher predictive value for the library rating than for the 
college grade. Tests V and VI have the highest predictive value 
for the rating while Tests VI and VII have the highest predictive 
value for the grade. 

The highest intercorrelation coefficients between the subtests 
may be read for each test in Table II. For example, Test I has 
highest intercorrelation coefficients with Tests V, VII and III, 
i.e., .58, .45, and .42. Test III is highest with Tests VII and II. 
Test VII is highest with III, II, VI and lin the order given. Test 
IV has no high relationship with the other subtests. Test V 
seems to be a good test for this group since it has high relation- 
ship with the rating and low intercorrelation with the other tests, 
except Test I. Test IV has low relationship to the criterion, and 
does not test high with any other test which does have high rela- 
tionship to the criterion. It seems to be the least valuable of the 
tests in predicting the rating. Test VII, which resembles a gen- 
eral intelligence test, has highly significant relationship with 
Tests III, II, VI, and I and these tests are no doubt duplicating 
one another. Test III, Arithmetic, seems to be duplicating with 
VII, Problems, as one would expect, and also with I, Memory, 
and II, Classification. Test II has a considerably higher inter- 
correlation with Test VII than with the rating. This test shows 
high significance with Test VII while it is only significant with 
the rating, and not significant with the college grade. 

In Table III, for the women, correlation coefficients for Tests 
I, II, an VII are significant with the library rating, but not highly 
so. When the correction for attenuation is made in Table IV, 
Test VII shows high significance and Test III shows significance. 
Test VII is the only one of the subtests which is significantly 
correlated with the college grade. Test V, which had the highest 
correlation with the rating for the men, is lowest with the rating 
for the women and is negatively correlated with Tests II, IV, VI, 
and VII. The highest intercorrelations are .56 between Tests II 
and VII, and .43 between TestsI and III. Tests I and II have 
higher predictive value for the rating for this group than for the 
grade, while Test VII seems to predict equally well for both rating 


and grade. 
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Since the library rating is not a perfect instrument, it may 
appear that scores on the subtests and the Total Score of the 
Clerical Test are less valid for prediction of success than they 
really are. Using the reliability coefficient of .77 obtained 
between two sets of raters for the library rating, correlation 
coefficients for the subtests and the Total Score have been cor- 
rected for attenuation and shown in Table IV for men and 
women. When the correction for attenuation is made, all but 
four of the fourteen correlation coefficients between the subtests 
of the Clerical Test and the library rating show marked relation- 
ship, and the coefficients of .75 and .70 for the Total Score and 
the rating indicate high relationship for both groups, men and 
women. 


SUMMARY 


The results of this study indicate: 

1) That for the group of men coefficients for subtests I, III, V, 
VI and VII are highly significant with the library rating; that 
subtest II is significant but not highly so and that subtest IV is 
not significant. 

2) That for the group of women coefficients for subtests I, II 
and VII are significant with the library rating, but not highly so; 
that when coefficients between the subtests and the rating are 
corrected for attentuation, subtest VII shows high significance 
and subtest III is raised to significance; that subtest V which is 
highest for the men is lowest for the women. 

3) That subtest V is the best of the tests for predicting the 
library rating for the group of men with an r of .56 while subtest 
VI is second with an r of .51. 

4) That Test VI is the best of the subtest for predicting the 
library rating for the group of women with an r of .46 with tests 
II and I competing for second place with r’s of 44 and 43, 
respectively. 

5) That subtests I, II and V have higher predictive value for 
the library rating than for the college grade for the men in this 
group. Test II is not significant with the grade, however. 

6) That for the women, coefficients between the subtests and 
the college grade are not significant and that the coefficient 
between the grade and the library rating is not significant. The 
fact that these coefficients are not significant is a warning that 
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one is likely to obtain very different results for another group of 
women, and that conclusions should not be drawn for this group 
from coefficients which do not show significance. 

7) That the women made significantly higher scores on sub- 
tests II, 1V and V thanthe men. No significant differences were 
found between the men and the women on the other subtests, 
the college grade, or the library rating. 


CONCLUSIONS 


The results of this study indicated that the Clerical Test was 
a good test for predicting clerical aptitude of student assistants 
in the Iowa State College Library; that certain subtests predicted 
success better than others; that the same subtests did not predict 
equally well for men and women; that the Clerical Test predicted 
success for the men somewhat better than the college grade; that 
since the correlation coefficient between the college grade and the 
library rating was not significant for the women, conclusions 
should not be drawn in regard to the predictive value of the 
Clerical Test versus the college grade for the women; and that 
the women made significantly higher scores on some of the sub- 
tests than the men. 
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A NOTE ON CORRECTING RELIABILITY 
COEFFICIENTS FOR RANGE 


FREDERICK B. DAVIS 


Codéperative Test Service of the American Council on Education 


A method for estimating the reliability coefficient of a test in 
one range of talent, knowing it in another, was presented many 
years ago by T. L. Kelley.!. In 1930, P. J. Rulon provided a con- 
venient graph for making use of Kelley’s formula.? There is, 
however, a situation that arises occasionally in educational 
research to which Kelley’s method is not applicable. This is the 
case when it is desired to estimate the reliability coefficient of a 
test in a sample curtailed with respect to a variable correlated 
with the test. Conversely, we may wish to estimate the reli- 
ability coefficient of a test in a sample free from the effects of 
curtailment with respect to a variable correlated with the test. 
Kelley’s method is applicable when the standard deviations of 
the test itself in both the restricted and unrestricted samples are 
known and the reliability coefficient has been calculated in one 
of the two samples. The method described below is applicable, 
however, even if the test itself has never been administered in one 
or the other of the samples. The required data are the reliability 
coefficient of the test and the correlation of the test and the 
restricting variable in one of the two samples plus the standard 
deviations of the restricting variable in both samples. It is 
assumed that the standard error of estimate is constant through- 
out the restricted and unrestricted ranges. 

The general formula for correcting a correlation coefficient for 
restriction of range with respect to a third variable with which 
the two variables are each correlated* may be rewritten to apply 
to the case in which we are particularly interested. 





1T. L. Kelley, Statistical Method. New York: Macmillan Co., 1924, 
p. 222. 

2 P. J. Rulon, “A Graph for Estimating Reliability in One Range, Know- 
ing It in Another,” J. Educ. Psychol., xx1 (February, 1930), pp. 140-142. 

3 Cf. K. Pearson, ‘“‘ Mathematical Contributions to the Theory of Evolu- 
tion—x1. On the Influence of Natural Selection on the Variability and 
Correlation of Organs,’’ Philosophical Transactions of the Royal Society of 
London, Series A, Vol. 200 (March, 1903), pp. 1-66. 
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Let: 

2, = the restricting variable for which the standard deviation 
in the restricted sample (denoted ¢;) and the standard 
deviation in the unrestricted sample (denoted 2;) are 
known. 

ato = the variable of which it is desired to ascertain the reli- 
ability coefficient in a sample restricted with respect to 
variable 2. 

" = equivalent halves of 22. 

ror = the reliability coefficient of variable x2 in a sample 
restricted with respect to variable 21. 

Rou = the reliability coefficient of variable zz in a sample free 
from restriction with respect to variable 2. 

rio = the correlation of variables xz; and z2 in a sample 
restricted with respect to variable 2. 

- = the correlations of variable x; and equivalent halves 

= of variable zz in a sample restricted with respect to 
variable 2. 
Then: 


T'16 


2 
Tae + Tia 16 (2 = 1) 
01 


Ras = >,” >,? 
V[1 +e (Ze - a) |p tne Ze ~s)] 
01 o1 


Since the two halves of the test are equivalent, this may be sim- 
plified to 
: 5) 
rat nit ( -=— | 
01 
a a (A) 


2 
+r? (2 _ 1) 
71 
To obtain the desired reliability coefficient, Rug may be cor- 


rected by means of the Spearman-Brown formula, as follows: 


2Ras 
1+ Re 














Rae = 


Ron = 


The methods of obtaining numerical values for the terms in 
formula (A) are obvious except for the term ri4. The derivation 
of this term is, therefore, given here. If x; and 22 are deviation 
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scores, X2 = 24 + 2, and 


D2x1(24 + Xs) 
VJ 221? SA d(x4 + 26)? 





Ti2 = T1446) = 








Therefore, 
os" 14 + el 16 


Vor + a6? + owerss 


Since the two halves of the test are equivalent, 





Ti2 = 





Qo ar 14 


V 2047 + 204’rae 
Simplifying and solving for ris, we obtain 


riaV2(1+ ra) (B) 


Tit = Tis = 9 








r12 = 








Occasionally, in calculations in which the correlations of equiv- 
alent halves of a test are involved, it is necessary to make use of 
the numerical value of the standard deviation of each half of the 
test. It may easily be shown that the standard deviation of each 
of two identical halves of a test is as follows: 


= = 71 eee 
03 = 05 /20 + ras) (C) 


where rs; is the correlation of the two halves. 

The use of formula (A) may be illustrated by a practical 
example. Let us say that all of the pupils taking first-year 
French in a certain high school were divided into three relatively 
homogeneous sections on the basis of scores on a general intelli- 
gence test. At the end of the first term a French test was admin- 
istered to the pupils in the top section. The reliability coefficient 
of the resulting scores has been computed. To estimate the 
reliability coefficient of the test if all of the first-year French stu- 
dents had been tested, it is necessary only to calculate the cor- 
relation of the French test and the general intelligence test among 
the pupils in the top section and the standard deviations of the 
general intelligence test scores in the top section and in all three 
sections taken together. 
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The Wechsler-Bellevue Intelligence Test is an individual test of 
general intelligence with many conspicuous advantages. It is 
compact and easily administered. It consists of eleven serial 
items of distinct psychological interest and, in addition, offers a 
verbal rating, a performance rating, and a general rating. Its 
structure thus lends itself to clinical evaluation. It takes less 
time to administer than a test such as the Revised Stanford- 
Binet Examination and takes considerably less time to administer 
than the usual battery of tests consisting of both verbal and non- 
verbal tests. The range in age of standardization population 
and the adjustment of the IQ to the age factor is also of definite 
advantage. 

The test adequately fulfills most criteria of reliability so that it 
may be regarded as a reliable test of what it measures. A major 
question, however, is the extent to which it is a valid measure of 
intelligence. While not necessarily the only or even the major 
criterion of its efficacy as a measure of intelligence, the test should 
correlate highly with accepted measures of intelligence. 

There have been several studies which have investigated the 
relationship between IQ’s derived from the Revised Stanford- 
Binet Examination, Form L, and those from the Wechsler- 
Bellevue Scale. The experimental populations covered include 
adult mental hospital patients':?:*> and children classified as 
behavior problems or delinquents®*:>, Wechsler himself‘ pre- 
sents a correlation between Stanford-Binet Examination (form 
unstated) and Wechsler-Bellevue Test IQ for seventy-five 
adolescents. 

Nine correlations between IQ’s of the Revised Stanford-Binet 
Examination, Form L, and the Wechsler-Bellevue Test IQ’s 
ranged between .81 and .93, mean r .89. The correlations thus 
all tended to be high. Differences between mean IQ’s tended to 
be small and insignificant. There seemed to be a tendency for 
bright subjects to test higher on the Revised Stanford-Binet 
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Examination and for the dull subjects to test higher on the 
Wechsler-Bellevue Test. There also seemed to be a tendency for 
the younger subjects to attain higher IQ’s on the Revised Stan- 
ford-Binet Examination and for the older subjects to test higher 
on the Wechsler-Bellevue Test. In two studies!® it was assumed 
that differences in IQ were to be explained simply by difference 
in test dispersion, and regression formulas were offered as the 
basis for predicting the IQ in one of the tests from a knowledge 
of the IQ in the other. All the studies, however, dealt with such 
deviate populations that the results could not be regarded as 
typical or final for all groups. For example, in the studies of the 
younger problem-behavior cases, there was a tendency for 
Wechsler-Bellevue Performance IQ’s to exceed the Wechsler- 
Bellevue Verbal and Total IQ’s as well as the Stanford-Binet IQ’s. 
This was probably truly descriptive of the mental patterns of the 
groups studied rather than simply a function of difference in test 
dispersion. 

The purpose of this study is to present briefly additional data 
on the relationships between results in the Revised Stanford- 
Binet, Form L, and Wechsler-Bellevue Scale. The data have 
been derived from an adolescent group whose adjustment and 
cultural background differ from that of the previously reported 
groups whose Stanford-Binet and Wechsler-Bellevue IQ’s have 
been contrasted. The present experimental population consists 
of sixty adolescents, thirty-five boys and twenty-five girls, at a 
mean age fourteen years, six months (Sigma 17.98 months, range 
eleven years, four months to seventeen years, ten months). All 
have been dependent children for varying periods of time and all 
were in foster homes when studied. The children were examined 
for purposes of educational guidance. Table 1 presents the mean 
IQ’s of the subjects. 


TABLE 1.—MEAN [Q’s or SuBJEcTs 


Test MeanIQ Sigma Range 
Revised Stanford Binet, L........ 95.02 16.30 59-146 
Full Wechsler-Bellevue........... 90.35 15.54 47-126 
Verbal Wechsler-Bellevue........ 93 .64 14.97 53-122 


Performance Wechsler-Bellevue... 89.04 15.65 55-123 


In the Weider ei al5 study of problem children, the Wechsler- 
Bellevue Performance IQ tends to be higher than the Wechsler- 
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Bellevue Verbal IQ. Halpern? finds the same to be true in her 
ten-to-fourteen-year-old group of children where the Performance 
IQ was higher in ninety-five per cent of the cases. In the present 
group, the mean Verbal IQ is highest. In sixty-two per cent of 
the cases, the Verbal IQ is higher than the Performance IQ. In 
thirty per cent the Performance IQ is higher. In eight per cent 


TABLE 2.—DISCREPANCIES BETWEEN STANFORD-BINET IQ AND 
WECHSLER-BELLEVUE IQ’s 





Wechsler-Bellevue 

















. Perform- 
Full IQ | Verbal IQ anes 10 
‘- Per . Per " Per 
ne Cent i Cent Ne Cent 
Stanford-Binet IQ higher 
than...................| 42] 7O | 34] 57 | 48] 72 
Stanford-Binet IQ lower 
CE Coe ks woncenneaees 16} 26 | 24} 40 | 15] 25 
No discrepancy........... 2 4 2 + 2 + 
Stanford-Binet IQ higher 
by more than 10 points .| 16 | 26 6; 10 | 24)! 40 
Stanford-Binet IQ lower 
by more than 10 points..} 3 5 5 8 8 | 13 
Discrepancy less than 10} 41 |} 68 | 49 | 82 | 28| 47 
6c sense daeeee 

















of the cases there is no difference. It is clear that we are here 
dealing with a population that is different in mental pattern from 
the problem adolescents previously studied. 

Correlations between the Revised Stanford-Binet, Form L, IQ 
and the three Wechsler-Bellevue I1Q’s are as follows: 


with full Wechsler-Bellevue IQ.............. .86 
with Verbal Wechsler-Bellevue IQ........... .80 
with Performance Wechsler-Bellevue IQ...... .67 


By evaluating the significance of a product-moment correlation 





sexi 
‘de 
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coefficient through the formula ¢t = WF -/N — 2, it ig 


estimated that for a population of sixty a value of r over .33 is 
significant at the one-per cent-level. All the above correlations 
are consequently significant. 

Discrepancies between the Revised Stanford-Binet IQ and the 
Wechsler-Bellevue IQ’s are shown in Table 2. 

There is a clear cut trend for the Stanford-Binet IQ to be higher 
than the Full, Verbal and Performance IQ’s. It is apparent that 
the discrepancy is most marked in relation to Wechsler-Bellevue 
Performance IQ and least marked in relation to Wechsler- 
Bellevue Verbal IQ. The contrast between Stanford-Binet and 
Full Wechsler-Bellevue IQ’s at various levels is shown in Table 3. 


TABLE 3.—CONTRAST BETWEEN STANFORD-BINET IQ AND FULL 
WECHSLER-BELLEVUE IQ at Various LEVELS 











Stanford- Stanford- No 
Binet higher | Binet lower | discrepancy 
Full Wechsler- 
Bellevue IQ a Per " —— a Per 
" | Cent " | Cent * | Cent 
Below 75........... 5 8 3 5 
Ses b6gGk sw eke 8 13 5 8 2 3 
90-109............. 19 32 8 13 
6 10 
120 and above...... 4 7 
SS dpwet axis 42 70 16 26 2 3 























Unlike previous studies, the Wechsler-Bellevue IQ in this 
investigation tends to be lower at all intelligence levels. It is 
true that this trend is clearest among the superior children with 
Wechsler-Bellevue I1Q’s 110 or higher. This would jibe with our 
practical experience that the Wechsler-Bellevue Test tends to be 
poorly discriminating in groups of superior adolescents. The 
problem of differentiating levels of superiority is an important 
one, for example, in any scholarship program, and seems to be 
better met with the Revised Stanford-Binet, Form L. 
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SUMMARY 


This study confirms the significant correlations between the 
Wechsler-Bellevue IQ ratings and the IQ obtained with the 
Revised Stanford-Binet Examination, Form L. Discrepancies 
have been noted, such as the general tendency for Wechsler- 
Bellevue IQ to be lower at all levels and the relative inefficacy of 
the Wechsler-Bellevue Test where one’s purpose is to discrimi- 
nate among a group of superior adolescents. In view of some 
differences in trend between our adolescent group and previous 
ones studied, however, it would seem as though differences 
between the two contrasted tests are only partially explained 
by differences in test dispersion. The discrepancies also seem 
to be genuinely descriptive of the mental patterns of groups 
studied. It is patent that a single regression formula derived 
from a small, selected sampling cannot have general applicability 
to all populations. 
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BOOK REVIEWS 


G. L. Bonp anp E. Bonn. Teaching the Child to Read. New 
York: Macmillan Co., 1943, pp. 356. 


In preparing this text for teachers in the elementary school, the 
authors present a workable program for teaching reading, one 
which is based upon both research findings and upon knowledge 
of requirements in the classroom. Technicalities are either 
omitted or presented in understandable prose. Separate 
sections are devoted to initial adjustment of the child to the 
school situation, preparation for reading, teaching in the initial 
stages of reading, and development of independent, extensive 
reading. 

Certain noteworthy aspects of the presentation follow: (1) 
Throughout the program there is emphasis upon individuali- 
zation in teaching. (2) Adequate attention is devoted to the 
diagnosis and development of reading readiness. (3) The view 
that reading is developmental is stressed throughout the program. 
(4) The advice that reading instruction be fundamentally 
purposeful, topical reading and that other methods be employed 
as teaching techniques to solve problems and avoid dangers 
inherent in the purposeful method is sound. (5) The evaluation 
of techniques for learning vocabulary is, for the most part, 
excellent. (6) The use of eye-movement pacing by means of 
marked copy, the metronoscope, film devices, etc., are properly 
de-emphasized. (7) Strict control of vocabulary difficulty is 
advised. (8) Development of comprehension abilities in the 
middle grades is stressed. (9) Problems in reading in the content 
subjects are thoroughly considered. (10) Appraisal of reading 
abilities is adequate. 

It seems to the reviewer that certain additions to or modi- 
fications of the material in this text would enhance its value to 
the teacher: (1) Although the authors do an excellent job in 
stating what should be done to achieve effective teaching, they 
frequently fail to indicate clearly how it is to be done. (2) 
While it is true that speed of reading is not stressed, the authors 
fail to point out adequately the harmful effects that may arise 
from the common practice of excessive drill to increase speed. 
(3) Perhaps more emphasis should be given to the integration of 
oral and silent reading from the beginning of instruction. (4) 
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The authors are possibly over-optimistic concerning the value of 
using small words within larger words in word-recognition. Even 
in demonstration classes, the reviewer has never observed a 
session where the technique was not wrongly employed by 
pupils. (5) A more frequent citation of aids to teachers would 
be helpful. (6) Possibly a more specific and complete treatment 
of reading disability would be helpful. Even the more efficient 
teachers are confronted with disabled readers. 

This text will undoubtedly become a favorite with students of 
procedures in teaching reading. The exposition is clear and the 
organization is good. Both the practical experience of the 
authors and their sound experimental background are clearly 
manifest throughout the treatise. Mixes A, TINKER 

University of Minnesota 


JAMES M. MacxintosH. The Warand Mental Health in England. 
New York: The Commonwealth Fund, 1944, pp. 91. 


How has the war affected the people of England in the various 
stages of its recent war experiences? Here is an opportunity to 
learn about the effect of drastic changes in situation on the men- 
tal health of the people. Rarely has there been an opportunity 
to observe and report such effects in a first-hand manner. In 
these essays Mackintosh, who is a professor of preventive medi- 
cine at the University of Glasgow, in a very interesting style tells 
us about England in revealing the mental health meanings of the 
attitudes expressed by the English people from the time of the 
great trade depression throughout the present war. These 
essays are presented under two large captions: (1) ‘‘The Impact 
of War,’ wherein are considered: The Process of Adjustment, 
1939-1940; The Lonely Year, 1940-1941; Defense, Preparation, 
and Alliance, 1941-1942; and The End of the Beginning, 1942- 
1943; (2) ‘‘ Mobilization for Peace,’’ which includes a considera- 
tion of the future mental health needs of England in terms of 
hospital services, voluntary organizations for mental health, 
professional education in mental health, and some future prob- 
lems of a general nature. 

During the period of the depression, Mackintosh reports, 
“England was regarded by her enemies as a fen of stagnant 
waters. She was abused and flouted, and her only answer was 
to lift the gentle finger of appeasement.”’ ‘‘Since the outbreak 


ve om 
oben 


510 The Journal of Educational Psychology 


of war,” he writes, ‘‘the majority of English people have fallen 
into a habit of self-reproach for their past weaknesses; they stand 
condemned as lazy and selfish, when they ought to have been 
alert and ready to meet force with force. From the first moment 
of Japanese encroachment on world peace, it is argued, the 
democratic countries should have seen the red light; and every 
subsequent event in Italy, Germany, and China ought to have 
convinced any sane person that war was inevitable and that the 
race was to the swift and the battle to the strong. We were 
blind,” says he, ‘‘to these growing dangers because we did not 
want to see them; we retired into a world of illusion because we 
were afraid of reality. The nation pleads guilty at the bar of 
history.” 

This well expresses the attitude of Mackintosh in his presenta- 
tions of the various attitudes and behavior manifestations that 
he talks about. The essays are, to use his own words, “‘lantern 
slides in a rough time-sequence, not continuous records.””’ They 
are well done; done by a man with an interest in mental hygiene, 
who has a good understanding of the country in which mental 
health practices are being applied and who has a clear perspective 
of the possibilities of a mental health program. Whether the 
detailed suggestions he makes concerning organization features 
for mental health in England are valid or not is something that 
one who is definitely more familiar with the interrelationships of 
England, Scotland, and Wales would be in a better position to 
say. In general, for future education Mackintosh favors central 
control or guidance from one school—for example, the London 
School of Economics. Americans can well profit by these 
experiences because to a large extent in America, too, we went 
through these attitudes, though at times years later and not 
always perhaps as blindly. H. MELTZER 

Psychological Service Center, St. Louis, Missouri 
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